On Thursday 12 July 2007 22:24:36 Joshua Wise wrote:
> Some time after my first[1] patch to mce.c, I had continued testing using
> the test method described there (inject hundreds of thousands of single-bit
> errors while reading from /dev/mcelog), and I came across a strange issue.
> Once every couple test cycles (i.e., every hundred-thousand injects), the
> system would begin behaving very strangely. The serial port would stop
> responding entirely, and my SSH sessions would respond, but to one keystroke
> behind what I had typed. After two minutes, the watchdog timer would reboot
> the system.
Thanks for the analysis.
I guess at some point it would make sense to model the whole thing in spin or
similar and find even the last race. Anyone interested? @)
>
> At this point, mce_read() and mce_log() are blocking on each other, and will
> be for all eternity. mce_log() is waiting for the on_each_cpu() to complete
> so that the loop can complete, and on_each_cpu() (from mce_read()) is
> waiting for mce_log() to finish.
I suspect the right fix is to get rid of collect_tscs().
One possible way would be to get the TSCs from the mce_log table
before the synchronization instead of asking the CPUs.
Or switch it over to the NMI supporting RCU that was recently posted.
> -- there may be other edge cases other than
> this one. I'm actually surprised that this wasn't a ring buffer to start
> with -- it certainly seems like it wanted to be one.
The problem with a ring buffer is that it would lose old entries; but
for machine checks you really want the first entries because
the later ones might be just junk.
-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]