Re: mmap over nfs leads to excessive system load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



--- Trond Myklebust <[email protected]> wrote:
> Chuck, can you take a look at this?
> 
> Kenny is seeing what a hang when using pwrite64() on an O_DIRECT file
> and the file size exceeds 4Gb. Server is a NetApp filer w/ NFSv3.
> 
> I had a quick look at nfs_file_direct_write(), and among other things,
> it would appear that it is not doing any of the usual overflow checks on
> *pos and the count size (see generic_write_checks()). In particular,
> checks are missing against overflow vs. MAX_NON_LFS if O_LARGEFILE is
> not set (and also against overflow vs. s_maxbytes, but that is less
> relevant here).
> 
> Cheers,
>   Trond

I tried the same test, but starting closer to 4GB... here is the final lines from strace:
remap_file_pages(0xb7b55000, 2097152, PROT_NONE, 1047544, MAP_SHARED) = 0
pwrite(3, "\0", 1, 8564768768)          = 1
remap_file_pages(0xb7b55000, 2097152, PROT_NONE, 1048056, MAP_SHARED) = 0
pwrite(3, "\0", 1, 8566865920)          = 1
remap_file_pages(0xb7b55000, 2097152, PROT_NONE, 1048568, MAP_SHARED) = 0
pwrite(3, "\0", 1, 8568963072

The pwrite never returns.
So it seems to be a problem NOT with an absolute 4GB, but with a total of 4GB having been written.

Here are the first few lines from the strace to show all the options being used:
open("/mnt/bar", O_RDWR|O_CREAT|O_DIRECT|O_LARGEFILE, 0644) = 3
pwrite(3, "\0", 1, 4280287232)          = 1
mmap2(NULL, 2097152, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0xff000) = 0xb7b8e000
pwrite(3, "\0", 1, 4282384384)          = 1
remap_file_pages(0xb7b8e000, 2097152, PROT_NONE, 2552, MAP_SHARED) = 0
pwrite(3, "\0", 1, 4284481536)          = 1
remap_file_pages(0xb7b8e000, 2097152, PROT_NONE, 3064, MAP_SHARED) = 0

/mnt is an nfs mount over GbE w/ jumbo frames (8160 mtu) cross-over directly to a netapp filer.

The mount options are: (from /proc/mounts)
 /mnt nfs rw,v3,rsize=32768,wsize=32768,hard,intr,lock,proto=tcp,addr=x.x.x.x 0 0

The card is an Intel e1000 - default module options (NAPI-enabled)
on a 64-bit PCIX 100MHz.
Kernel is 2.6.15-rc w/ Trond's nfs patch.
Machine is a 2x Pentium 4 Xeon 2.66GHz (HT enabled), w/ 2GB ram and 4GB swap.

vmstat shows:
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 2  0      0 1336864 123212 203936    0    0     0    20 1111  1129  1 26 73  0
 1  0      0 1336608 123212 203936    0    0     0     0 1078  1076  1 25 74  0
 1  0      0 1336864 123212 203936    0    0     0     0 1077  1087  1 26 73  0

the sy of 25 is one virtual CPU with 100% system.

Oprofile shows time being spent:

samples  %        symbol name
303102   42.4732  zap_pte_range
133702   18.7355  _spin_lock
61145     8.5682  __bitmap_weight
43169     6.0492  page_address
42196     5.9129  unmap_vmas
30132     4.2224  unmap_page_range

-Kenny



	
		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux