PJS = Peter J. Stieber PJS>> I was under the impression the test kernel had PJS>> some type of debug messages in it someone PJS>> would be interested in. Was I wrong about that?
DJ = Dave Jones DJ> No, you are correct. But nothing triggered with the latest builds.
PJS>> I guess your saying I should go a head and update PJS>> and see what happens?
DJ> Theres a number of other fixes in there which may have DJ> caused the problem to go into hiding.. I can't reproduce DJ> it at all any more, and some others who were seeing it DJ> haven't seen it recently either.
This morning I ran Memtest-86 v3.2 on the machine in question. I let it run for a little over 6 hours. It made 7 passes of the memory tests and had no errors.
Next I updated the kernel to 2.6.11-1.27_FC3smp. The problem happened pretty quickly for me:
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe000(0000000000401b80).
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe008(000000000000000b).
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe010(0000000000000220).
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe018(000000000000000c).
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe020(0000000000000220).
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe028(000000000000000d).
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe030(00000000000001f7).
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe038(000000000000000e).
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe040(00000000000001f7).
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe048(0000000000000017).
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe058(000000000000000f).
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe060(00007ffffffff081).
May 23 16:34:36 maggie kernel: collect2:9582: mm/memory.c:98: bad pmd ffff8100190fe080(0034365f36387800).
The collect2 command is coming from the build sequence I described in earlier emails. If I try it a second time ir runs successfully. I've seen sh cause it too.
Note that 34365f363878 in the last line = 46_68x in ASCII, which is x86_64 in reverse.
Is there anything else I should be looking for?
I'd be willing to try any debug kernel to help find the problem.
Thanks for the help Dave.
Pete