Re: [PATCH] prune_icache_sb

Andrew Morton wrote:

On Mon, 27 Nov 2006 18:52:58 -0500
Wendy Cheng <[email protected]> wrote:

Not sure about walking thru sb->s_inodes for several reasons....
1. First, the changes made are mostly for file server setup with largefs size - the entry count in sb->s_inodes may not be shorter theninode_unused list.


umm, that's the best-case.  We also care about worst-case.  Think:
1,000,000 inodes on inode_unused, of which a randomly-sprinkled 10,000 are
from the being-unmounted filesytem.  The code as-proposed will do 100x more
work that it needs to do.  All under a global spinlock.

By walking thru sb->s_inodes, we also need to take inode_lock andiprune_mutex (?), since we're purging the inodes from the system - orspecifically, removing them from inode_unused list. There is really notmuch difference from the current prune_icache() logic. What's beenproposed here is simply *exporting* the prune_icache() kernel code toallow filesystems to trim (purge a small percentage of ) its(potentially will be) unused per-mount inodes for *latency* considerations.

I made a mistake by using the "page dirty ratio" to explain the problem(sorry! I was not thinking well in previous write-up) that could misleadyou to think this is a VM issue. This is not so much aboutlow-on-free-pages (and/or memory fragmentation) issue (thoughfragmentation is normally part of the symptoms). What the (external)kernel module does is to tie its cluster-wide file lock with in-memoryinode that is obtained during file look-up time. The lock is removedfrom the machine when


1. the lock is granted to other (cluster) machine; or
2. the in-memory inode is purged from the system.

One of the clusters that has this latency issue is an IP/TV applicationwhere it "rsync" with main station server (with long geographicaldistance) every 15 minutes. It subsequently (and constantly) generateslarge amount of inode (and locks) hanging around. When other nodes,served as FTP servers, within the same cluster are serving the files,DLM has to wade through huge amount of locks entries to know whether thelock requests can be granted. That's where this latency issue getspopped out. Our profiling data shows when the cluster performance isdropped into un-acceptable ranges, DLM could hogs 40% of CPU cycle inlock searching logic. From VM point of view, the system does not havememory shortage so it doesn't have a need to kick off prune_icache() call.

This issue could also be fixed in several different ways - maybe by abetter DLM hash function, maybe by asking IT people to umount thefilesystem where *all* per-mount inodes are unconditionally purged (butit defeats the purpose of caching inodes and, in our case, the locks)after each rsync, ...., etc. But I do think the proposed patch is themost sensible way to fix this issue and believe it will be one of thesefunctions that if you export it, people will find a good use of it. Ithelps with memory fragmentation and/or shortage *before* it becomes aproblem as well. I certainly understand and respect a maintainer'sdaunting job on how to take/reject a patch - let me know how you thinkso I can start to work on other solutions if required.


-- Wendy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [PATCH] prune_icache_sb
  - From: Andrew Morton <[email protected]>

References:
- [PATCH] prune_icache_sb
  - From: Wendy Cheng <[email protected]>
- Re: [PATCH] prune_icache_sb
  - From: Andrew Morton <[email protected]>
- Re: [PATCH] prune_icache_sb
  - From: Wendy Cheng <[email protected]>
- Re: [PATCH] prune_icache_sb
  - From: Andrew Morton <[email protected]>

Prev by Date: Re: [rfc PATCH] ieee1394: ohci1394: delete bogus spinlock, flush MMIO writes
Next by Date: Re: 2.6.19-rc6-mm2
Previous by thread: Re: [PATCH] prune_icache_sb
Next by thread: Re: [PATCH] prune_icache_sb
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]