On Wed, Apr 26, 2006 at 01:31:14PM -0700, Christoph Lameter wrote:
> Dave: Can you tell us more about the tree_lock contentions on I/O that you
> have seen?
Sorry to be slow responding - I've been sick the last couple of days.
Take a large file - say Size = 5x RAM or so - and then start
N threads runnnning at offset (n / Size) where n = the thread
number. They each read (Size / N) and so typically don't overlap.
Throughput with increasing numbers of threads on a 24p altix
on an XFS filesystem on 2.6.15-rc5 looks like:
++++ Local I/O Block size 262144 ++++ Thu Dec 22 03:41:42 PST 2005
Loads Type blksize count av_time tput usr% sys% intr%
----- ---- ------- ----- ------- ------- ---- ---- -----
1 read 256.00K 256.00K 82.92 789.59 1.80 215.40 18.40
2 read 256.00K 256.00K 53.97 1191.56 2.10 389.40 22.60
4 read 256.00K 256.00K 37.83 1724.63 2.20 776.00 29.30
8 read 256.00K 256.00K 52.57 1213.63 2.20 1423.60 24.30
16 read 256.00K 256.00K 60.05 1057.03 1.90 1951.10 24.30
32 read 256.00K 256.00K 82.13 744.73 2.00 2277.50 18.60
^^^^^^^ ^^^^^^^
Basically, we hit a scaling limitation at b/t 4 and 8 threads. This was
consistent across I/O sizes from 4KB to 4MB. I took a simple 30s PC sample
profile:
user ticks: 0 0 %
kernel ticks: 2982 99.97 %
idle ticks: 4 0.13 %
Using /proc/kallsyms as the kernel map file.
====================================================================
Kernel
Ticks Percent Cumulative Routine
Percent
--------------------------------------------------------------------
1897 63.62 63.62 _write_lock_irqsave
467 15.66 79.28 _read_unlock_irq
91 3.05 82.33 established_get_next
74 2.48 84.81 generic__raw_read_trylock
59 1.98 86.79 xfs_iunlock
47 1.58 88.36 _write_unlock_irq
46 1.54 89.91 xfs_bmapi
40 1.34 91.25 do_generic_mapping_read
35 1.17 92.42 xfs_ilock_map_shared
26 0.87 93.29 __copy_user
23 0.77 94.06 __do_page_cache_readahead
16 0.54 94.60 unlock_page
15 0.50 95.10 xfs_ilock
15 0.50 95.61 shrink_cache
15 0.50 96.11 _spin_unlock_irqrestore
13 0.44 96.55 sub_preempt_count
11 0.37 96.91 mpage_end_io_read
10 0.34 97.25 add_preempt_count
10 0.34 97.59 xfs_iomap
9 0.30 97.89 _read_unlock
So read_unlock_irq looks to be triggered by the mapping->tree_lock.
I think that the write_lock_irqsave() contention is from memory
reclaim (shrink_list()->try_to_release_page()-> ->releasepage()->
xfs_vm_releasepage()-> try_to_free_buffers()->clear_page_dirty()->
test_clear_page_dirty()-> write_lock_irqsave(&mapping->tree_lock...))
because page cache memory was full of this one file and demand is
causing them to be constantly recycled.
Cheers,
Dave.
--
Dave Chinner
R&D Software Enginner
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]