Re: no idle tick
Idle power savings does not by itself justify HZ=0.
We'll get the same idle power consumption with HZ=1.
Indeed, within measurement error, we'll get the same
idle power consumption with HZ=10.
Linux should probably default to HZ=100, and have
the capability to speed up to HZ=1000 at run-time
if applications request it; and it should slow down
to HZ=10 in deep idle.
If we keep HZ=10 in idle rather than going all
the way to HZ=0, it allows the C-state promotion code
to work without any special cases to wake the system
when idle just to promote to a deeper C-state --
i.e. like it works today.
Re: multiple LAPIC rates on SMP
This concept doesn't work when it is needed (C3)
and isn't needed when it works (C1/C2).
This is because the LAPIC timer stops in C3,
and the latencies in C1/C2 are so low that
it doesn't matter what the tick rate is.
Re: using TSC to patch things up
Nope. TSC is variable on some processors with P-states,
and on some processors it stops in C3.
I'm not happy about this reality either.
Re: LAPIC timer vs P-states
On the systems I'm aware of, LAPIC timer is based
on the bus speed rather than the core speed. So
today it should be constant or zero -- that is until
some HW guy decides to throttle the bus at run-time
to save power. Based on the history of the TSC --
frozen in C3 and sometimes variable with MHz changes;
it would not surprise me a bit to see the LAPIC, now
frozen in C3, become variable in some future power
saving state that varies bus speed.
Re: re-calibrating the un-frozen LAPIC timer
I think we're on thin-ice if we endeavor to continue
to use the LAPIC timer. The multiple LAPIC rates
on SMP concept is defunct (above), so the only benefit
of using the LAPIC timer is that it is lower latency
to re-program it when we re-program the global rate.
But then we have to do this on all logical processors
and we have to add the code correct it with a
stable reference time-source.
This must be compared to simply using the stable
reference time-source in the first place, and perhaps
not changing its rate as frequently.
Re: what to do?
A proposal:
1. disable LAPIC timer use on uni-processor
it adds no value, and breaks if C3 is supported.
2. disable LAPIC timer use on SMP, via
Venki's timer broadcast patch, or similar.
3. Transparently use HZ=10 in "deep idle"
This can be done the same way that C-state
promotions are done -- when we recognize
that we're still idle after a long time,
take steps to get into a deeper state.
eg. we might say that entry to C3 or C4
is "deep idle", or better yet, we might
base this on the advertised latency of
the C-states since low latency states will
not notice clock ticks and high-latency
states will become ineffective if ticks
are too frequent.
4. Apply "boot-time dynamic HZ" patch, and default
to hz=100.
5. Move to real "run-time dynamic HZ" where the
system HZ can be changed by programs that need
it changed.
thoughts?
-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]