Re: Detecting I/O error and Halting System

First of all, thank you for your analysis.

I don't think that it's a HDD problem nor a cable
problem because the servers are new. We have tried
different HDD (seagate, maxtor) but it has not help
anyway.
It's perhaps a temperature problem but we make a lot
tests in hard condition (high temperature) 
successfuly...    

One thinks that the problem comes from the VIA chipset
VT82c686 (it's also the opinion of Dick Johnson
(linux-os) whom advised me to try UDMA33 instead of
UDMA66).

How can I determine the problem?

I want to add that the HDD seems to be disconnected
(the BIOS can't find any drive for boot) after a
simple reset. We must switch off the servers to get
them work again.
However, it takes a long time (4 mounths and more)
before the HDD fell down. I want to work around by
write a module which will supervise the HDD. I know
how to write a module (I used the lkmpg guide
(http://www.tldp.org/LDP/lkmpg/) but how can I
shutdown Linux from inside a module...? 

best regards.

Zine.

--- Alan Cox <alan@lxorguk.ukuu.org.uk> a écrit :

> On Llu, 2006-03-27 at 16:55 +0200, zine el abidine
> Hamid wrote:
> > hda: status timeout: status=0xd0 { Busy }     
> adapter
> > disque annonce un status busy du DMA
> 
> If I'm reading the translation right then your hard
> disk decided
> it was busy and then never came back
> 
> > Feb 12 04:46:23 porte_de_clignancourt_nds_b
> kernel:
> > ide0: reset: success             
> 
> So the IDE layer tried to reset it
> 
> > Feb 12 10:22:38 porte_de_clignancourt_nds_b
> kernel:
> > hda: timeout waiting for DMA
> 
> Which didnt help
> 
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > ide0: reset: success
> 
> Still trying
>       
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > hda: irq timeout: status=0xd0 { Busy }            
>    
> > 
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > hda: DMA disabled       
> 
> We gave up on DMA to see if PIO would help
> >                               
> >    
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > ide0: reset timed-out, status=0x80
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > hda: status timeout: status=0x80 { Busy }        
> > nouvel échec de reset
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > hda: drive not ready for command
> > Feb 12 10:24:47 porte_de_clignancourt_nds_b
> kernel:
> > ide0: reset: success                  
> 
> And reset.. 
> 
> 
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> kernel:
> > hda: status timeout: status=0x80 { Busy }
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> kernel:
> > hda: drive not ready for command
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> kernel:
> > ide0: reset timed-out, status=0x80
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> kernel:
> > end_request: I/O error, dev 03:02 (hda), sector
> 102263
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> syslogd:
> > /var/log/maillog: Input/output error
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> kernel:
> > end_request: I/O error, dev 03:02 (hda), sector
> 110720
> > Feb 12 13:45:38 porte_de_clignancourt_nds_b
> kernel:
> > end_request: I/O error, dev 03:02 (hda), sector
> 110728 
> 
> Eventually we give up.
> 
> 
> First thing to check would be the disk and the
> temperature, then the
> cabling. In particular make sure the *long* part of
> the cable is between
> the drive and the controller.
> 
> 

___________________________________________________________________________ 
Nouveau : téléphonez moins cher avec Yahoo! Messenger ! Découvez les tarifs exceptionnels pour appeler la France et l'international.
Téléchargez sur http://fr.messenger.yahoo.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Detecting I/O error and Halting System
  - From: Gene Heskett <gene.heskett@verizon.net>
- Re: Detecting I/O error and Halting System
  - From: Alan Cox <alan@lxorguk.ukuu.org.uk>

References:
- Re: Detecting I/O error and Halting System
  - From: Alan Cox <alan@lxorguk.ukuu.org.uk>

Prev by Date: Re: [RFC] Virtualization steps
Next by Date: Need help reporting bug, no 3D accel with Matrox g400
Previous by thread: Re: Detecting I/O error and Halting System
Next by thread: Re: Detecting I/O error and Halting System
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]