Re: XFS lstat() _very_ slow on SMP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 18, 2005 at 07:45:30PM +0200, Jan Kasprzak wrote:
> Christoph Hellwig wrote:
> : On Mon, May 16, 2005 at 06:25:06PM +0200, Jan Kasprzak wrote:
> : > 	Hi all,
> : > 
> : > 	I have a big XFS volume on my fileserver, and I have noticed that
> : > making an incremental backup of this volume is _very_ slow. The incremental
> : > backup essentially checks mtime of all files on this volume, and it
> : > takes ~4ms of _system_ time (i.e. no iowait or what) to do a lstat().
> : 
> : Thanks a lot for the report, I'll investigate what's going on once I get
> : a little time.  (Early next week I hope)
> 
> 	Hmm, I feel like I am hunting ghosts - after a fresh reboot
> of the 4-CPU server I did four runs of 128*128*128 files with various
> sizes of the underlying filesystem (in order to eliminate the volume
> size as a problematic factor). I've got the following numbers:
> 
> Volume size   create time           find -mtime +1000     cost of lseek()
>   5GB         55m77 real 52m51 sys  1m1 real 0m53 sys       19 usecs
>  25GB         58m15 real 55m27 sys  83m47 real 82m15 sys  2171 usecs (!!!!!!)
> 125GB         67m0 real 61m35 sys   0m55 real 0m48 sys      18 usecs
> 625GB         68m30 real 62m38 sys  0m57 real 0m49 sys      18 usecs
> 
> 	So the results are probably not dependent on the volume size,
> but on something totally random (such as which cpu the command
> ends up running on or something like that), or on the system uptime
> (and implied fragmentation of memory or what).
> 
> 	I've tried to re-run the same test the next day (i.e. on
> server with longer uptime), but the server crashed - my test script
> ended locked up somewhere in kernel (probably holding some locks),
> and then other processes started to lock up after accessing the file
> system (my top(1) was running OK, but when I tried to "touch newfile"
> in another shell, it locked up as well).  So I had to reset this server
> again.
> 
> 	I am not really sure where exactly the problem is. I think
> it is related to XFS, big memory of this server (26 GB), four CPUs,
> and maybe even the x86_64 architecture. I was not able to reproduce
> the problem on the same HW using ext3fs, and the problem is also
> a magnitude smaller on 2-way system with 4GB of RAM. Maybe I should
> try to reproduce this on our Altix box to eliminate the x86_64 as the
> possible source of problems.
> 
> 	I use the attached "bigtree.pl" to create the directory structure
> ("time ./bigtree.pl /new-volume 3 128" for 128*128*128 files), and then
> "strace -c find /new-volume -type f -mtime +1000 -print" (the numbers
> without strace are almost the same, so strace is not a problem here).

I couldn't reproduce the odd case here.  Could you try to get some profiling
data with oprofile for the odd and one of the normal cases?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux