Re: mmap over nfs leads to excessive system load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Kenny Simpson <[email protected]> wrote:
>
> I ran the same test again against 2.6.15-rc, and got pretty much the same thing.  It starts nice
> and fast (30+MB/s, but drops down to under 10MB/s with the system time pegging one CPU).
> 
> Here is the oprofile result:
> 
> CPU: P4 / Xeon with 2 hyper-threads, speed 2658.47 MHz (estimated)
> Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask
> of 0x01 (mandatory) count 100000
> samples  %        symbol name
> 412585   14.6687  find_get_pages_tag
> 343898   12.2267  mpage_writepages
> 290144   10.3155  release_pages
> 288631   10.2617  unlock_page
> 286181   10.1746  pci_conf1_write
> 267619    9.5147  clear_page_dirty_for_io
> 128128    4.5554  __lookup_tag
> 120895    4.2982  page_waitqueue
> 52739     1.8750  _spin_lock_irqsave
> 43623     1.5509  skb_copy_bits
> 30157     1.0722  __wake_up_bit
> 29973     1.0656  _read_lock_irqsave
> 

Your application walks the file in 2MB hunks, doing ftruncate() each time
to expand the file by another 2MB.

nfs_setattr() implements the truncate.  It syncs the whole file, using
filemap_write_and_wait() (that seems a bit suboptimal.  All we're doing is
increasing i_size??)

So filemap_write_and_wait() has to write 2MB's worth of pages.  Problem is,
_all_ the pages, even the 99% which are clean are tagged as dirty in the
pagecache radix tree.  So find_get_pages_tag() ends up visiting each page
in the file, and blows much CPU doing so.

The writeout happens in mpage_writepages(), which uses
clear_page_dirty_for_io() to clear PG_dirty.  But it doesn't clear the
dirty tag in the radix tree.  It relies upon the filesystem to do the right
thing later on.  Which is all very unpleasant, sorry.  See the explanatory
comment over clear_page_dirty_for_io().

nfs_writepage() doesn't do any of the things which that comment says it
should, hence the radix tree tags are getting out of sync, hence this
problem.

NFS does strange, incomprehensible-to-little-akpms things in its writeout
path.  Ideally, it should run set_page_writeback() prior to unlocking the
page and end_page_writeback() when I/O completes.  That'll keep the VM
happier while fixing this performance glitch.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux