2.6.18.2: sporadic SATA port resets (Broadcom BCM5785 (HT1000))

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi there,

we`ve got a database server machine running a 2.6.18.2 vanilla kernel on 
Debian Etch. The database is MySQL 5. Everything works fine, but sometimes 
the server "lags", i.e. it doesn`t respond for 30 seconds. We`ve now 
investigated the problem and found this messages in syslog (and dmesg):

15:55:44 omega11 kernel: ata1: port is slow to respond, please be patient
15:55:44 omega11 kernel: ata1: soft resetting port
15:55:44 omega11 kernel: ata1: port is slow to respond, please be patient
15:55:44 omega11 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 
300)
15:55:44 omega11 kernel: ATA: abnormal status 0xD0 on port 0xFFFFC2000000401C
15:55:44 omega11 last message repeated 5 times
15:55:44 omega11 kernel: ata1.00: qc timeout (cmd 0xec)
15:55:44 omega11 kernel: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
15:55:44 omega11 kernel: ata1: failed to recover some devices, retrying in 5 
secs
15:55:44 omega11 kernel: ata1: hard resetting port
15:55:44 omega11 kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 
300)
15:55:44 omega11 kernel: ata1.00: configured for UDMA/133
15:55:44 omega11 kernel: ata1: EH complete
15:55:44 omega11 kernel: SCSI device sda: 293046768 512-byte hdwr sectors 
(150040 MB)
15:55:44 omega11 kernel: sda: Write Protect is off
15:55:44 omega11 kernel: SCSI device sda: drive cache: write back

We`ve got this messages up to 5 times a day since as far as our syslogs reach. 

It seems no kind of queuing is used:
# cat /sys/block/sda/device/queue_type
none
# cat /sys/block/sda/device/queue_depth
1

The server is up for 91 days now and has low to medium load (depending on 
daytime). Since it`s a production server located in a datacenter, we can`t 
just test some random kernel on it :(

Does somebody have a glue whats going on here? Could it be a hardware failure? 
We have an identical machine using the same kernel. It`s used as a webserver. 
There also this messages shows up, but not that often (10 times in 91 days 
uptime). If it is a hardware failure, then both machines would been affected 
by the same hardware problem.

What can we do to fix this problem? Is it known? 

I`ve found many posts related to SATA problems, but none seemed to be about 
this problem.

Do you need additional information?

Thanks

cu,
  Emmy

P.S.: Please CC me, since i`m not subscribed.
00:01.0 PCI bridge: Broadcom HT1000 PCI/PCI-X bridge
00:02.0 Host bridge: Broadcom HT1000 Legacy South Bridge
00:02.1 IDE interface: Broadcom HT1000 Legacy IDE controller
00:02.2 ISA bridge: Broadcom HT1000 LPC Bridge
00:03.0 USB Controller: Broadcom HT1000 USB Controller (rev 01)
00:03.1 USB Controller: Broadcom HT1000 USB Controller (rev 01)
00:03.2 USB Controller: Broadcom HT1000 USB Controller (rev 01)
00:04.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller (rev 05)
00:05.0 Ethernet controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller (rev 05)
00:06.0 VGA compatible controller: XGI - Xabre Graphics Inc Volari Z7
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
01:0d.0 PCI bridge: Broadcom HT1000 PCI/PCI-X bridge (rev c0)
01:0e.0 RAID bus controller: Broadcom BCM5785 (HT1000) SATA Native SATA Mode

Attachment: config.gz
Description: GNU Zip compressed data

processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : Dual Core AMD Opteron(tm) Processor 275
stepping        : 2
cpu MHz         : 2194.616
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy
bogomips        : 4390.69
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : Dual Core AMD Opteron(tm) Processor 275
stepping        : 2
cpu MHz         : 2194.616
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy
bogomips        : 4390.11
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor       : 2
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : Dual Core AMD Opteron(tm) Processor 275
stepping        : 2
cpu MHz         : 2194.616
cache size      : 1024 KB
physical id     : 1
siblings        : 2
core id         : 0
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy
bogomips        : 4393.11
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor       : 3
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : Dual Core AMD Opteron(tm) Processor 275
stepping        : 2
cpu MHz         : 2194.616
cache size      : 1024 KB
physical id     : 1
siblings        : 2
core id         : 1
cpu cores       : 2
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy
bogomips        : 4393.51
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp


[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux