Re: sched_clock() uses are broken

On Tue, 2 May 2006, Russell King wrote:

> On Tue, May 02, 2006 at 06:43:45PM +0200, Andi Kleen wrote:
> > Russell King <[email protected]> writes:
> > > 
> > > However, this is not the case.  On x86 with TSC, it returns a 54 bit
> > > number.  This means that when t1 < t0, time_passed_ns becomes a very
> > > large number which no longer represents the amount of time.
> > 
> > Good point. For a 1Ghz system this would happen every ~0.57 years.
> > 
> > The problem is there is AFAIK no non destructive[1] way to find out how
> > many bits the TSC has
> > 
> > Destructive would be to overwrite it with -1 and see how many stick.
> > 
> > > All uses in kernel/sched.c seem to be aflicted by this problem.
> > > 
> > > There are several solutions to this - the most obvious being that we
> > > need a function which returns the nanosecond difference between two
> > > sched_clock() return values, and this function needs to know how to
> > > handle the case where sched_clock() has wrapped.
> > 
> > Ok it can be done with a simple test.

Better yet the sched_clock() implementation just needs to return a value 
shifted left so the wrap around always happens on 64 bits and the 
difference between two consecutive samples is always right.

> > > 
> > > IOW:
> > > 
> > > 	t0 = sched_clock();
> > > 	/* do something */
> > > 	t1 = sched_clock();
> > > 
> > > 	time_passed = sched_clock_diff(t1, t0);
> > > 
> > > Comments?
> > 
> > Agreed it's a problem, but probably a small one. At worst you'll get
> > a small scheduling hickup every half year, which should be hardly 
> > that big an issue.

... on x86 that is.

> > Might chose to just ignore it with a big fat comment?
> 
> You're right assuming you have a 64-bit TSC, but ARM has at best a
> 32-bit cycle counter which rolls over about every 179 seconds - with
> gives a range of values from sched_clock from 0 to 178956970625 or
> 0x29AAAAAA81.
> 
> That's rather more of a problem than having it happen every 208 days.

Yet that counter isn't necessarily nanosecond based.  So rescaling the 
returned value to nanosecs requires expensive divisions which could be 
done only once within sched_clock_diff() instead of twice as often in 
each sched_clock() calls.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: sched_clock() uses are broken
  - From: George Anzinger <[email protected]>

References:
- sched_clock() uses are broken
  - From: Russell King <[email protected]>
- Re: sched_clock() uses are broken
  - From: Andi Kleen <[email protected]>
- Re: sched_clock() uses are broken
  - From: Russell King <[email protected]>

Prev by Date: Re: [patch 00/14] remap_file_pages protection support
Next by Date: Re: sched_clock() uses are broken
Previous by thread: Re: sched_clock() uses are broken
Next by thread: Re: sched_clock() uses are broken
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]