MegaRaid 8408E goes out to lunch with nr_requests > 8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This happens both on 2.6.17 and 2.6.18rc1 using the megaraid, mptsas and
mptscsih drivers supplied with the kernel.

While writing data to raid0 devs on a LSI MegaRaid 8408E controller, the
devices will hang after somewhere between 4-7gb of data written.  If I
dial the nr_requests back from the default down to 8, the hang will not
occur.  The hang does occur at 16.  I haven't tested values between the
two, but I'm not too optimistic.  From what I can see, it looks like 8
should be a magic number to make the queue look congested more often
than not.

Here are the messages I get when the devices go out to lunch:
Jul 11 14:13:34 systemname kernel: sd 4:2:0:0: megasas: RESET -40213 cmd=2a
Jul 11 14:13:34 systemname kernel: megasas: [ 0]waiting for 256 commands
to complete
Jul 11 14:13:39 systemname kernel: megasas: [ 5]waiting for 256 commands
to complete
Jul 11 14:13:44 systemname kernel: megasas: [10]waiting for 256 commands
to complete
Jul 11 14:13:49 systemname kernel: megasas: [15]waiting for 256 commands
to complete

[...]

Jul 11 14:16:35 systemname kernel: megasas: [175]waiting for 256
commands to complete
Jul 11 14:16:35 systemname kernel: megasas: failed to do reset
Jul 11 14:16:35 systemname kernel: sd 4:2:1:0: megasas: RESET -40216 cmd=2a
Jul 11 14:16:35 systemname kernel: megasas: cannot recover from previous
reset failures
Jul 11 14:16:35 systemname kernel: sd 4:2:0:0: megasas: RESET -40213 cmd=2a
Jul 11 14:16:35 systemname kernel: megasas: cannot recover from previous
reset failures
Jul 11 14:16:35 systemname kernel: sd 4:2:0:0: megasas: RESET -40213 cmd=2a
Jul 11 14:16:35 systemname kernel: megasas: cannot recover from previous
reset failures
Jul 11 14:16:35 systemname kernel: sd 4:2:0:0: scsi: Device offlined -
not ready after error recovery
Jul 11 14:16:36 systemname last message repeated 13 times

Interestingly, the machine will hang on shutdown and requires a hard
reset to reboot.  Bummer!

My next step is to try and reproduce and dig into this some in KDB.

Has anyone else seen this and/or does anyone have some suggestions for
further debugging info?

--
Dave Lloyd
Test Engineer, Exegy, Inc.
314.450.5342
[email protected]


--
Dave Lloyd
Test Engineer, Exegy, Inc.
314.450.5342
[email protected]
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux