I have a system with a Tyan 2885 motherboard (S2885-ANRF) that uses dual Opteron 244 processors. Each processor has 1 GB of memory for a total of 2 GB. I am using a SATA HD. I am running the latest stock release of the SMP version of the FC3 kernel for x86_64. uname -a output follows:
Linux maggie 2.6.11-1.14_FC3smp #1 SMP Thu Apr 7 19:36:23 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
The computer is worldly node for a small cluster of computers. It is resposible for building a code that is run on the cluster. A shell script is used to start the build process. Occasionally when the script is started it crashes and the following messages are place in /var/log/messages (sorry for the ugly line wrap):
May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4000(0000000000000008). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4010(0000000000000009). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4018(0000000000401b80). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4020(000000000000000b). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4028(0000000000000220). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4030(000000000000000c). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4038(0000000000000220). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4040(000000000000000d). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4048(00000000000001f7). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4050(000000000000000e). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4058(00000000000001f7). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4060(0000000000000017). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4070(000000000000000f). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4078(00007ffffffff098). May 11 16:26:56 maggie kernel: mm/memory.c:97: bad pmd ffff81002f6a4098(000034365f363878).
If the shell script is started immediately after the crash it works. I have been seeing "bad pmd" messages for quite some time and the shell script that builds the code is not the only event that triggers them.
I'm pretty sure the memory in the machine is fine. I have noticed a thread on linux-kernel list discussing the problem, but haven't had a chance to post there yet.
I have appended the output of lspci and lsmod to the end of this message.
Has anyone on this list noticed similar messages?
The linux-kernel thread indicates the problem is x86_64 specific and seems to be hilighted by Tyan HW.
Any insight would be greatly appreciated. Pete
/bin/lspci
00:06.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8111 PCI (rev 07)
00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-8111 LPC (rev 05)
00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-8111 IDE (rev 03)
00:07.2 SMBus: Advanced Micro Devices [AMD] AMD-8111 SMBus 2.0 (rev 02)
00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-8111 ACPI (rev 05)
00:07.5 Multimedia audio controller: Advanced Micro Devices [AMD] AMD-8111
AC97 Audio (rev 03)
00:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev
12)
00:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01)
00:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev
12)
00:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X APIC (rev 01)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM
Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
HyperTransport Technology Configuration
00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Address Map
00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM
Controller
00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron]
Miscellaneous Control
02:07.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit
Ethernet (rev 10)
02:09.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5703X Gigabit
Ethernet (rev 02)
03:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
03:00.1 USB Controller: Advanced Micro Devices [AMD] AMD-8111 USB (rev 0b)
03:0b.0 Unknown mass storage controller: Silicon Image, Inc. (formerly CMD
Technology Inc) SiI 3114 [SATALink/SATARaid] Serial ATA Controller (rev 02)
03:0c.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000
Controller (PHY/Link)
04:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-8151 System Controller
(rev 13)
04:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8151 AGP Bridge (rev 13)
05:00.0 VGA compatible controller: nVidia Corporation NV34 [GeForce FX 5200]
(rev a1)
/sbin/lsmod
Module Size Used by md5 5953 1 ipv6 297665 22 parport_pc 32809 0 lp 16145 0 parport 45773 2 parport_pc,lp autofs4 24521 0 sunrpc 169017 1 pcmcia 30549 0 yenta_socket 25033 0 rsrc_nonstatic 11969 1 yenta_socket pcmcia_core 57241 3 pcmcia,yenta_socket,rsrc_nonstatic video 20169 0 button 9185 0 battery 12233 0 ac 6857 0 ohci1394 38361 0 ieee1394 385721 1 ohci1394 ohci_hcd 25429 0 i2c_amd8111 8129 0 i2c_core 28353 1 i2c_amd8111 hw_random 7393 0 snd_intel8x0 38977 0 snd_ac97_codec 91537 1 snd_intel8x0 snd_pcm_oss 62193 0 snd_mixer_oss 22209 1 snd_pcm_oss snd_pcm 109257 3 snd_intel8x0,snd_ac97_codec,snd_pcm_oss snd_timer 29897 1 snd_pcm snd 65417 6 snd_intel8x0,snd_ac97_codec,snd_pcm_oss,snd_mixer_oss,snd_pcm,snd_timer soundcore 12641 1 snd snd_page_alloc 13513 2 snd_intel8x0,snd_pcm r8169 33485 0 tg3 91717 0 floppy 68881 0 dm_snapshot 19713 0 dm_zero 4033 0 dm_mirror 25553 0 ext3 148561 2 jbd 69105 1 ext3 dm_mod 69761 6 dm_snapshot,dm_zero,dm_mirror sata_sil 11589 2 libata 54601 1 sata_sil sd_mod 20929 3 scsi_mod 155665 2 libata,sd_mod