Re: sata_sil24 broken since 2.6.23-rc4-mm1

On 10/4/07, Matt Mackall <[email protected]> wrote:
> On Thu, Oct 04, 2007 at 07:32:52AM +0200, Torsten Kaiser wrote:
> > So now I'm rather out of ideas what to test... :(
>
> I'd give your previous bisect step another try.

Yes, I thought about that too. But I never seemed to need more than
two tries to make it fail.
So I would only suspect the last good step as wrong positive.
That would then point to the first of your maps2-patches, the moving
of the pagewalker code.
Would you thing that this is a plausible cause?

> Looking back at the thread a bit, anything that requires the machine
> to be off for more than a couple seconds to manifest stops looking
> like software and firmware and starts looking like a heat-related
> electrical or mechanical issue. Make sure your backups are current.

What backups? :-)

Yes, I also thought about hardware trouble, but the bisect result
seemed to consistent.
Also that its not always the same drive that fails, only every time
one of the sil-drives.

I now have activated ATA_DEBUG to see if the good and the bad boots differ.
It looks the same until the RAID5 starts.

Good boot:
[   40.160000] ata_scsi_dump_cdb: CDB (2:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.160000] ata_scsi_dump_cdb: CDB (1:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.160000] ata_scsi_dump_cdb: CDB (2:0,0,0) 2a 00 25 42 d6 09 00 00 08
[   40.160000] ata_sg_setup: 1 sg elements mapped
[   40.160000] ata_scsi_dump_cdb: CDB (1:0,0,0) 2a 00 25 42 d6 09 00 00 08
[   40.160000] ata_sg_setup: 1 sg elements mapped
[   40.160000] ata_scsi_dump_cdb: CDB (2:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.160000] ata_scsi_dump_cdb: CDB (1:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.320000] nv_swncq_host_interrupt: id 0x3 SWNCQ: qc_active 0x1
dhfis 0x1 dmafis 0x1 sactive 0x0
[   40.320000] nv_swncq_sdbfis: over
[   40.320000] ata_scsi_dump_cdb: CDB (3:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.320000] ata_exec_command: ata3: cmd 0xEA
[   40.390000] ata_hsm_move: ata3: protocol 1 task_state 3 (dev_stat 0x40)
[   40.390000] ata_hsm_move: ata3: dev 0 command complete, drv_stat 0x40
[   40.420000] md: considering sdb1 ...
[   40.440000] md:  adding sdb1 ...
[   40.440000] md:  adding sda1 ...
[   40.450000] md: created md0
[   40.460000] md: bind<sda1>
[   40.470000] md: bind<sdb1>
[   40.480000] md: running: <sdb1><sda1>
[   40.500000] raid1: raid set md0 active with 2 out of 2 mirrors

Bad boot:
[   40.060000] ata_scsi_dump_cdb: CDB (2:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.060000] ata_scsi_dump_cdb: CDB (1:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.060000] ata_scsi_dump_cdb: CDB (2:0,0,0) 2a 00 25 42 d6 09 00 00 08
[   40.060000] ata_sg_setup: 1 sg elements mapped
[   40.060000] ata_scsi_dump_cdb: CDB (1:0,0,0) 2a 00 25 42 d6 09 00 00 08
[   40.060000] ata_sg_setup: 1 sg elements mapped
[   40.060000] ata_scsi_dump_cdb: CDB (2:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.200000] nv_swncq_host_interrupt: id 0x3 SWNCQ: qc_active 0x1
dhfis 0x1 dmafis 0x1 sactive 0x0
[   40.200000] nv_swncq_sdbfis: over
[   40.200000] ata_scsi_dump_cdb: CDB (3:0,0,0) 35 00 00 00 00 00 00 00 00
[   40.200000] ata_exec_command: ata3: cmd 0xEA
[   40.270000] ata_hsm_move: ata3: protocol 1 task_state 3 (dev_stat 0x40)
[   40.270000] ata_hsm_move: ata3: dev 0 command complete, drv_stat 0x40
[   70.060000] ata_scsi_timed_out: ENTER
[   70.060000] ata_scsi_timed_out: EXIT, ret=0
[   70.080000] ata_scsi_error: ENTER
[   70.080000] ata_port_flush_task: ENTER
[   70.100000] ata1: ata_port_flush_task: EXIT
[   70.110000] __ata_port_freeze: ata1 port frozen
[   70.220000] __ata_port_freeze: ata1 port frozen
[   70.230000] ata_eh_link_autopsy: ENTER
[   70.240000] ata_eh_link_autopsy: EXIT
[   70.250000] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
[   70.270000] ata1.00: cmd 61/08:00:09:d6:42/00:00:25:00:00/40 tag 0
cdb 0x0 data 4096 out
[   70.270000]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask
0x4 (timeout)

After [   40.060000] ata_scsi_dump_cdb: CDB (1:0,0,0) 2a 00 25 42 d6 09 00 00 08
the drive sda falls of the earth and can't be recovered through soft-
or hard-resetting the port by the error handler.

So I will use the weekend to see if I can find out who issues this
command and add more debug to that place...

Torsten
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
  - From: "Torsten Kaiser" <[email protected]>

References:
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
  - From: "Torsten Kaiser" <[email protected]>
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
  - From: "Torsten Kaiser" <[email protected]>
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
  - From: "Torsten Kaiser" <[email protected]>
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
  - From: Matt Mackall <[email protected]>
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
  - From: "Torsten Kaiser" <[email protected]>
- Re: sata_sil24 broken since 2.6.23-rc4-mm1
  - From: Matt Mackall <[email protected]>

Prev by Date: Re: [PATCH] update checkpatch.pl to version 0.10
Next by Date: Re: Linux 2.6.23-rc9 and a heads-up for the 2.6.24 series..
Previous by thread: Re: sata_sil24 broken since 2.6.23-rc4-mm1
Next by thread: Re: sata_sil24 broken since 2.6.23-rc4-mm1
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]