Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))

Hi,

Emmeran Seehuber wrote:

we`ve got a database server machine running a 2.6.18.2 vanilla kernel onDebian Etch. The database is MySQL 5. Everything works fine, but sometimesthe server "lags", i.e. it doesn`t respond for 30 seconds. We`ve nowinvestigated the problem and found this messages in syslog (and dmesg):
15:55:44 omega11 kernel: ata1: port is slow to respond, please be patient
15:55:44 omega11 kernel: ata1: soft resetting port
15:55:44 omega11 kernel: ata1: port is slow to respond, please be patient
15:55:44 omega11 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl300)
15:55:44 omega11 kernel: ATA: abnormal status 0xD0 on port 0xFFFFC2000000401C
15:55:44 omega11 last message repeated 5 times
15:55:44 omega11 kernel: ata1.00: qc timeout (cmd 0xec)
15:55:44 omega11 kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
15:55:44 omega11 kernel: ata1: failed to recover some devices, retrying in 5secs
15:55:44 omega11 kernel: ata1: hard resetting port
15:55:44 omega11 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl300)
15:55:44 omega11 kernel: ata1.00: configured for UDMA/133
15:55:44 omega11 kernel: ata1: EH complete
15:55:44 omega11 kernel: SCSI device sda: 293046768 512-byte hdwr sectors(150040 MB)
15:55:44 omega11 kernel: sda: Write Protect is off
15:55:44 omega11 kernel: SCSI device sda: drive cache: write back

This is just the recovery part. Need more log. If possible, pleasegive a shot at 2.6.20. It might have fixed your problem or at leastallow better diagnosis.

We`ve got this messages up to 5 times a day since as far as our syslogs reach.
It seems no kind of queuing is used:
# cat /sys/block/sda/device/queue_type
none
# cat /sys/block/sda/device/queue_depth
1
The server is up for 91 days now and has low to medium load (depending ondaytime). Since it`s a production server located in a datacenter, we can`tjust test some random kernel on it :(


I see.

Does somebody have a glue whats going on here? Could it be a hardware failure?

It might be. Quite some SATA bug reports turn out to be hardwareproblem, most commonly PSU issues.

We have an identical machine using the same kernel. It`s used as a webserver.There also this messages shows up, but not that often (10 times in 91 daysuptime). If it is a hardware failure, then both machines would been affectedby the same hardware problem.


Hmmm...

What can we do to fix this problem? Is it known?I`ve found many posts related to SATA problems, but none seemed to be aboutthis problem.
Do you need additional information?

Yeah, please post the content of /var/log/boot.msg if available and theresult of dmesg and lspci -nn.


--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
  - From: Emmeran Seehuber <[email protected]>

References:
- 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
  - From: Emmeran Seehuber <[email protected]>

Prev by Date: [PATCH 4/10] lguest: Initialize esp0 properly all the time
Next by Date: [PATCH 5/10] Make hvc_console.c compile on non-PowerPC
Previous by thread: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
Next by thread: Re: 2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]