Re: SATA exceptions with 2.6.20-rc5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Björn Steinbrink wrote:
Running a kernel with the return statement replace by a line that prints
the irq_stat instead.

Currently I'm seeing lots of 0x10 on ata1 and 0x0 on ata2.
40 minutes stress test now and no exception yet. What's interesting is
that ata1 saw exactly one interrupt with irq_stat 0x0, all others that
might have get dropped are as above.
I'll keep it running for some time and will then re-enable the return
statement to see if there's a relation between the irq_stat 0x0 and the
exception.
No, doesn't seem to be related, did get 2 exceptions, but no irq_stat
0x0 for ata1. Syslog/dmesg has nothing new either, still the same
pattern of dismissed irq_stats.
I've finally managed to reproduce this problem on my box, by doing:

watch --interval=0.1 /sbin/hdparm -I /dev/sda

on one drive and then running bonnie++ on /dev/sdb connected to the other port on the same controller device. Usually within a few minutes one of the IDENTIFY commands would time out in the same way you guys have been seeing.
Through some various trials and tribulations, the only conclusion I can 
come to is that this controller really doesn't like that 
NV_INT_STATUS_CK804 register being looked at in ADMA mode. I tried 
adding some debug code to the qc_issue function that would check to see 
if the BUSY flag in altstatus went high or that register showed an 
interrupt within a certain time afterwards, however that really seemed 
to hose things, the system wouldn't even boot.
Try out this patch, it just calls the ata_host_intr function where 
appropriate without using nv_host_intr which looks at the 
NV_INT_STATUS_CK804 register. This is what the original ADMA patch from 
Mr. Mysterious NVIDIA Person did, I'm guessing there may be a reason for 
that. With this patch I can get through a whole bonnie++ run with the 
repeated IDENTIFY requests running without seeing the error.
As an aside, there seems to be some dubious code in nv_host_intr, if 
ata_host_intr returns 0 for handled when a command is outstanding, it 
goes and calls ata_check_status anyway. This is rather dangerous since 
if an interrupt showed up right after ata_host_intr but before 
ata_check_status, the ata_check_status would clear it and we would 
forget about it. I tried fixing just that issue and still had this 
problem however. I suspect that code is truly broken and needs further 
thought, but this patch avoids calling it in the ADMA case, at any rate.
As a final aside, this is another case where the hardware docs for this 
controller would really be useful, in order to know whether we are 
actually supposed to be reading that register in ADMA mode or not. I 
sent a query to Allen Martin at NVIDIA asking if there's a way I could 
get access to the documents, but I haven't heard anything yet.
--
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/

--- linux-2.6.20-rc5/drivers/ata/sata_nv.c	2007-01-19 19:18:53.000000000 -0600
+++ linux-2.6.20-rc5debug/drivers/ata/sata_nv.c	2007-01-22 18:35:09.000000000 -0600
@@ -750,9 +750,9 @@ static irqreturn_t nv_adma_interrupt(int
 
 			/* if in ATA register mode, use standard ata interrupt handler */
 			if (pp->flags & NV_ADMA_PORT_REGISTER_MODE) {
-				u8 irq_stat = readb(host->mmio_base + NV_INT_STATUS_CK804)
-					>> (NV_INT_PORT_SHIFT * i);
-				handled += nv_host_intr(ap, irq_stat);
+				struct ata_queued_cmd *qc = ata_qc_from_tag(ap, ap->active_tag);
+				if(qc && !(qc->tf.flags & ATA_TFLAG_POLLING))
+					handled += ata_host_intr(ap, qc);
 				continue;
 			}
 

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux