On Nov 9, 2007 7:14 AM, Justin Piszcz <[email protected]> wrote: > > > > On Thu, 8 Nov 2007, Carlos Carvalho wrote: > > > Jeff Lessem ([email protected]) wrote on 6 November 2007 22:00: > > >Dan Williams wrote: > > > > The following patch, also attached, cleans up cases where the code looks > > > > at sh->ops.pending when it should be looking at the consistent > > > > stack-based snapshot of the operations flags. > > > > > >I tried this patch (against a stock 2.6.23), and it did not work for > > >me. Not only did I/O to the effected RAID5 & XFS partition stop, but > > >also I/O to all other disks. I was not able to capture any debugging > > >information, but I should be able to do that tomorrow when I can hook > > >a serial console to the machine. > > > > > >I'm not sure if my problem is identical to these others, as mine only > > >seems to manifest with RAID5+XFS. The RAID rebuilds with no problem, > > >and I've not had any problems with RAID5+ext3. > > > > Us too! We're stuck trying to build a disk server with several disks > > in a raid5 array, and the rsync from the old machine stops writing to > > the new filesystem. It only happens under heavy IO. We can make it > > lock without rsync, using 8 simultaneous dd's to the array. All IO > > stops, including the resync after a newly created raid or after an > > unclean reboot. > > > > We could not trigger the problem with ext3 or reiser3; it only happens > > with xfs. In our case all process using md4, including md4_resync, stay in D state. Call Trace: [<ffffffff803615ac>] __generic_unplug_device+0x13/0x24 [<ffffffff803622cf>] generic_unplug_device+0x18/0x28 [<ffffffff803f2cf7>] get_active_stripe+0x22b/0x472 ... see dmesg (sysrq t) attached. We can reproduce this problem in two machines with the same configuration: - 2 x Dual-Core Opteron 2.8GHz - 8GB memory - 3ware 9000 with 10 x 750GB sata disks - Debian Etch x86_64 - raid5 + xfs (/dev/md4) in all these stock kernel's: - 2.6.22.11, 2.6.22.12, 2.6.23.1, 2.6.24-rc2 running: - for i in f{0..7}; do (dd bs=1M count=100000 if=/dev/zero of=$i &); done If we increase /sys/block/md4/md/stripe_cache_size the device and process back to work.
Attachment:
dmesg_sysrq_t.txt.gz
Description: GNU Zip compressed data
- References:
- 2.6.23.1: mdadm/raid5 hung/d-state
- From: Justin Piszcz <[email protected]>
- Re: 2.6.23.1: mdadm/raid5 hung/d-state
- From: Justin Piszcz <[email protected]>
- Re: 2.6.23.1: mdadm/raid5 hung/d-state
- From: "Dan Williams" <[email protected]>
- Re: 2.6.23.1: mdadm/raid5 hung/d-state
- From: Justin Piszcz <[email protected]>
- Re: 2.6.23.1: mdadm/raid5 hung/d-state
- From: "Dan Williams" <[email protected]>
- Re: 2.6.23.1: mdadm/raid5 hung/d-state
- From: BERTRAND Joël <[email protected]>
- Re: 2.6.23.1: mdadm/raid5 hung/d-state
- From: Dan Williams <[email protected]>
- Re: 2.6.23.1: mdadm/raid5 hung/d-state
- From: Jeff Lessem <[email protected]>
- Re: 2.6.23.1: mdadm/raid5 hung/d-state
- From: [email protected] (Carlos Carvalho)
- Re: 2.6.23.1: mdadm/raid5 hung/d-state
- From: Justin Piszcz <[email protected]>
- 2.6.23.1: mdadm/raid5 hung/d-state
- Prev by Date: nozomi version 2.1c for review
- Next by Date: Re: [OOPS] 2.6.23.1 in NFS
- Previous by thread: Re: 2.6.23.1: mdadm/raid5 hung/d-state
- Next by thread: Re: 2.6.23.1: mdadm/raid5 hung/d-state
- Index(es):