PJS = Peter J. Stieber PJS>> I have a system with a Tyan 2885 motherboard PJS>> (S2885-ANRF) that uses dual Opteron 244 processors. PJS>> Each processor has 1 GB of memory for a total of PJS>> 2 GB. I am using a SATA HD. I am running the latest PJS>> stock release of the SMP version of the FC3 kernel PJS>> for x86_64. uname -a output follows: PJS>> PJS>> Linux maggie 2.6.11-1.14_FC3smp #1 SMP PJS>> Thu Apr 7 19:36:23 EDT 2005 PJS>> x86_64 x86_64 x86_64 GNU/Linux PJS>> PJS>> The computer is worldly node for a small cluster of PJS>> computers. It is resposible for building a code that PJS>> is run on the cluster. A shell script is used to PJS>> start the build process. Occasionally when the script PJS>> is started it crashes and the following messages are PJS>> place in /var/log/messages (sorry for the ugly line PJS>> wrap): PJS>> PJS>> May 11 16:26:56 maggie kernel: mm/memory.c:97: PJS>> bad pmd ffff81002f6a4000(0000000000000008).
DJ = Dave Jones DJ> Please grab the latest test kernel from DJ> http://people.redhat.com/davej/kernels/Fedora/FC3 DJ> and try to reproduce this. It contains debugging code DJ> that hopefully will help nail this.
Thanks Dave. I loaded the kernel:
Linux maggie 2.6.11-1.24_FC3smp #1 SMP Tue May 10 19:12:22 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux
I'm trying to force the problem to occur, but as was reported on the linux-kenel list, it isn't obvious how to make the problem rear its ugly head.
Are you looking for /var/log/messages output when it happens?
Thanks again for the help. I'm very willing to serve as a debug test bed as my worldly node is a Tyan S2885 Thunder K8W motherboard running the SMP version of x86_64 FC3 and my compute nodes are Tyan S2850 Tomcat K8S motherboards running the non-SMP version of x86_64 FC3.
Will reply to this thread when the problem pops up,
The problem is occuring again with Dave's test kernel.
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d008(0000000000000008).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d018(0000000000000009).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d020(0000000000401b80).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d028(000000000000000b).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d030(00000000000001f4).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d038(000000000000000c).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d040(00000000000001f4).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d048(000000000000000d).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d050(00000000000001f7).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d058(000000000000000e).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d060(00000000000001f7).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d068(0000000000000017).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d078(000000000000000f).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d080(00007ffffffff0a4).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d0a0(5f36387800000000).
May 14 10:00:18 maggie kernel: collect2:14167: mm/memory.c:98: bad pmd ffff81005856d0a8(0000000000003436).
and from today's logs:
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d38(00000037e5100a88).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d40(0000000000000003).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d48(00007ffffffffee9).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d50(00007ffffffffeea).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d58(00007ffffffffeeb).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d68(00007ffffffffeec).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d70(00007ffffffffeed).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d78(00007ffffffffeee).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d80(00007ffffffffeef).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d88(00007ffffffffef0).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d90(00007ffffffffef1).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898d98(00007ffffffffef2).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898da0(00007ffffffffef3).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898da8(00007ffffffffef4).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898db0(00007ffffffffef5).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898db8(00007ffffffffef6).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898dc0(00007ffffffffef7).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898dc8(00007ffffffffef8).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898dd8(0000000000000010).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898de0(00000000078bfbff).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898de8(0000000000000006).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898df0(0000000000001000).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898df8(0000000000000011).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e00(0000000000000064).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e08(0000000000000003).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e10(0000000000400040).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e18(0000000000000004).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e20(0000000000000038).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e28(0000000000000005).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e30(0000000000000009).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e38(0000000000000007).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e48(0000000000000008).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e58(0000000000000009).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e60(0000000000417b10).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e68(000000000000000b).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e78(000000000000000c).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e88(000000000000000d).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898e98(000000000000000e).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898ea8(0000000000000017).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898eb8(000000000000000f).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898ec0(00007ffffffffee2).
May 15 04:25:49 maggie kernel: sh:30541: mm/memory.c:98: bad pmd ffff810062898ee0(34365f3638780000).
Dave,
I'm willing to provide what you need to debug, or try other test kernels.
I also posted to the linux-kernel list.
Pete