On Thu, 7 Sep 2006, Albert Graham wrote:
Hello all (especially the very technical),
I have been experiencing hardware lockups and crashes under Linux (Fedora
Core 5 latest kernel version 2.6.17-1.2174_FC5smp). The crashes occur under
what appears to be very heavy disk access and possibly multiple concurrent
access (i.e. multiple threads).
I experience crashes using Mysql (MySQL-server-4.1.21-0.glibc23) latest 4.1
stable. In this case we also have multiple threads generating a database of
approx 13-30G in size or a period of about 18 hours.
I also have experienced crashes using rsync local_disk to local_disk copies-
this creates multiple threads (unlike a simple copy - cp command which is a
single thread).
The servers are 10 x:
Woodcrest 5160 3Ghz (dual Core+Dual Xeon) (1333 FSB)
Supermicro servers
http://www.supermicro.com/products/system/1U/6015/SYS-6015P-8R.cfm
Motherboard
http://www.supermicro.com/products/motherboard/Xeon1333/5000P/X7DBP-8.cfm
(BIOS 1.1c)
16 GB FB-DIMM RAM 677Mhz - Approved and personally tested by Supermicro USA
3ware 9550SX-4
4x500GB SATA Seagate Drives/16Mb cache.
HINTS
====
The crashes ONLY happen if we enable all 4 Cores in the BIOS (Dual core =
enabled)
Our tests run 100% perfect if we disable the second core if each Xeon! (i.e.
one core from each Xeon)
My questions
=========
Are there any "known" problems with Dual Core Xeons under load - e.g.
microcode issues ? kernel bugs ?
From the kernel perspective is there any difference in operating code (i,e,
ignoring any superficial stuff like /proc/cpuinfo stuff) for Dual Core Xeons
?
I assumed that Dual Core would use the exact same code as SMP kernel ? is
this correct ? - I'm told it's not
Are there any special specific patches for Dual Core ? (I did notice in RH AS
4 a change log that stated something list "improved scheduling for Dual Core"
Things I've tried
===========
I have tried most combination of BIOS settings e.g. ACPI disabled in BIOS,
kernel parameters acpi=off noacpi noapic etc.. all of which make no
difference - the machines all crash unless I disabled Dual Core ?
I've had extensive contact with Supermicro, 3ware and now Intel - all of
which are blaming each other ?
I've also recompiled the FC5 source RPM with exact same results.
I'm told that AMD had a similar problem with one of their dual cores, but
this was fixed long ago and I assume that fix was specific to AMD chips and
would not apply to Intel due to differences in architecture.
Any suggestions for helping be solve these crash problems would be very very
much appreciated.
Thanks in advance.
Albert.
BIOS Output on boot:
Phoenix TrustedCore(tm) Server
Copyright 1985-2005 Phoenix Technologies Ltd.
All Rights Reserved
Supermicro X7DBP-8/X7DBP-I BIOS Rev 1.1b
CPU = 2 Processors Detected, Cores per Processor = 2
Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
Intel(R) Xeon(R) CPU 5160 @ 3.00GHz
DRAM Type : DDR2-667, FSB at 1333MHz
16384M System RAM Passed
4096 KB L2 Cache
System BIOS shadowed
Video BIOS shadowed
I will post some crash traces from our serial console server as a reply to
this message shortly.
.... 'no' to all your questions, but in my experience:
I bought a computer with an SuperMicro OEM motherboard, the H8DCE
http://www.supermicro.com/Aplus/motherboard/Opteron/nForce/H8DCE.cfm
and I had nothing but the same trouble you tell about. With this
motherboard and a 3ware 9500S-8.
Using the onboard SATA was not a problem - only with the 3ware.
I sent the card back to 3ware, upgraded the bios in the motherboard, and
updated the firmware in the
drives - all no luck.
Head over to https://www.3ware.com/
those guys are on top of things!
Then I put a Tyan S2850G2N in the server no problems!
http://www.tyan.com/PRODUCTS/html/tomcatk8s.html
I question the OEM line of SuperMicro.
ed