Tejun Heo wrote:
By crash I mean the whole system going down, having to reset the entire machine.Pablo Sebastian Greco wrote:First of all, thanks for everything, and my excuses if I'm doing anything wrong, this is my first lkml mail, but I've read all the faq, so should be OK. This is the machine with the problem: Intel ServerBoard S5000VSA Dual Core Xeon 2.66 (Intel(R) Xeon(TM) CPU 2.66GHz stepping 04) 4G Kingston 1 Seagate 80G sata (ST380211AS) (sda) 3 Samsung 250G sata (SAMSUNG SP2504C) (sdb,c,d) Installed distribution is FC6 x86_64 I've been getting these messages with distribution and vanilla kernels Jan 1 16:29:08 squid kernel: ata4.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x2 frozen Jan 1 16:29:08 squid kernel: ata4.00: cmd 61/60:00:c9:6d:8e/00:00:0e:00:00/40 tag 0 cdb 0x0 data 49152 out Jan 1 16:29:08 squid kernel: res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 1 16:29:08 squid kernel: ata4.00: cmd 60/08:08:f7:7d:56/00:00:0e:00:00/40 tag 1 cdb 0x0 data 4096 in Jan 1 16:29:08 squid kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) <snip> Jan 1 16:29:08 squid kernel: ata4: soft resetting port Jan 1 16:29:08 squid kernel: ata4: softreset failed (port busy but CLO unavailable) Jan 1 16:29:08 squid kernel: ata4: softreset failed, retrying in 5 secs Jan 1 16:29:13 squid kernel: ata4: hard resetting port Jan 1 16:29:21 squid kernel: ata4: port is slow to respond, please be patient (Status 0x80) Jan 1 16:29:43 squid kernel: ata4: port failed to respond (30 secs, Status 0x80) Jan 1 16:29:43 squid kernel: ata4: COMRESET failed (device not ready) Jan 1 16:29:43 squid kernel: ata4: hardreset failed, retrying in 5 secs Jan 1 16:29:48 squid kernel: ata4: hard resetting port Jan 1 16:29:49 squid kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jan 1 16:29:49 squid kernel: ata4.00: configured for UDMA/133 Jan 1 16:29:49 squid kernel: ata4: EH complete Jan 1 16:29:49 squid kernel: SCSI device sdd: 488397168 512-byte hdwr sectors (250059 MB) Jan 1 16:29:49 squid kernel: sdd: Write Protect is off Jan 1 16:29:49 squid kernel: SCSI device sdd: write cache: enabled, read cache: enabled, doesn't support DPO or FUA lots of them, and eventually crashing the system. Tested from fc6 2.6.18 kernel to vanilla 2.6.20-rc2-mm1. Old kernels just crash, newer ones log these things and then crash. I don't want to flood with this mail with useless info, so please tell me what to send and I'll do it (dmesg, smartctl... you name it) BTW, memtest was running for about 2 days without errors, and and badblocks on all 4 drives returned nothing. Reallocated_Sector_Ct raw_value was 0 on all 4 drivesPlease post full dmesg and the result of 'lspci -nnvvv'. And what do you mean by 'crash'?
I'm sending you 4 files:dmesg: current boot dmesg, just a boot, because no errors appeared after last crash, since the server is out of production right now (errors usually appear under heavy load, and this primarily a transparent proxy for about 1000 simultaneous users)
lspci: the way you asked for itmessages and messages.1: files where you can see old boots and crashes (even a soft lockup). If there is anything else I can do, let me know. If you need direct access to the server, I can arrange that too.
Thanks. Pablo.
Attachment:
messages.tar.bz2
Description: Binary data
- Follow-Ups:
- Re: SATA problems
- From: Tejun Heo <[email protected]>
- Re: SATA problems
- References:
- SATA problems
- From: Pablo Sebastian Greco <[email protected]>
- Re: SATA problems
- From: Tejun Heo <[email protected]>
- SATA problems
- Prev by Date: Re: [PATCH v4 01/13] Linux RDMA Core Changes
- Next by Date: Re: fuse, get_user_pages, flush_anon_page, aliasing caches and all that again
- Previous by thread: Re: SATA problems
- Next by thread: Re: SATA problems
- Index(es):