Re: [PATCH] timer tsc ensure we allow for initial tsc and tsc sync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2006-01-20 at 12:53 +0000, Andy Whitcroft wrote:
> We have been seeing "BUG: soft lockup detected on CPU#0!" messages
> from testing runs of mainline for some time.  This has only been
> showing on a very small subset of the systems under test, as it
> turns out the slower ones.
> 
> Resolving this issue is complex, not because the fix itself is
> complex but because of the timer rework which is currently pending
> in -mm.  As a result this patch is against 2.6.16-rc1.  So far
> we have had no such errors from runs against -mm, but I am unsure
> whether that system eliminates this issue, or mearly is lucky as
> faster systems are currently with mainline.

Hey Andy, Sorry for the slow reply.

The timekeeping rework is not going to go into 2.6.16 and is currently
out of -mm until I can resolve a few laptop issues. 


> John perhaps you could comment?  Also, how experimental is the timer
> code, is it likely to go into 2.6.16 or is it more experimental
> than that?  If so perhaps we need to try and slip a fix like this
> underneath it.

I'd def try to push a fix in for the issue. I'll just merge my code
around the fix.



> timer tsc ensure we allow for initial tsc and tsc sync
> 
> During early initialisation we select the timer source to use for
> the high resolution time source.  When selecting the TSC we don't
> take into account the initial value of the TSC, this leads to a
> jump in the clock at the next clock tick.  We also fail to take into
> account that the TSC synchronisation in an SMP system resets the TSC.
> 
> In both cases this will lead to the timer believing that 0-N TSC
> ticks have passed since the last real timer tick.  This will lead
> to the clock jumping ahead by nearly the maximum time represented
> by the lower 32bits of the TSC.  For a 1GHz machine this is close to
> 4s, on slower boxes this can exceed 10s and trip the softlock tests.


This sounds very similar to bugme bug #5366
http://bugzilla.kernel.org/show_bug.cgi?id=5366


There's a test patch in there that maybe you could try?


thanks
-john

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux