Thought you might find this note useful ...
-------------------------------------------------
TSC and Power Management Events on AMD Processors
Nov 2, 2005
Rich Brunner, AMD Fellow
-------------------------------------------------
Current AMD Opteron(tm) and Athlon(tm)64 processors provide
power management mechanisms that independently adjust the
performance state ("P-state") and power state ("C-state") of
the processor[1][2]; these state changes can affect a
processor core's Time Stamp Counter (TSC) which some
operating systems may use as a part of their time keeping
algorithms. Most modern operating systems are well aware of
the effect of these state changes on the TSC and the
potential for TSC drift[3] across multiple processor cores
and properly account for it. Although cores may drift with
respect to one another, an individual core's TSC is always
monotonically increasing. This drift can *not* occur on
single-processor single-core platforms.
This note reviews a few corner cases that an Operating
System should consider when using the TSC to derive or
interpolate time. It also highlights AMD's long-term
direction for the TSC.
Applications should avoid using the TSC directly for
timekeeping and instead rely on the appropriate operating
system calls. Using the TSC directly means that an
application is not protected from TSC-drift and does not
benefit from the logic in the operating system to
work-around it; as a result, applications using TSC directly
could get confused by TSC-drift.
P-state Changes
===============
P-state changes are performed by changing a processor core's
input voltage and clock reference rate; thus this changes
the rate at which the TSC increments. The effect of P-state
changes on the TSC exists in all 7th-generation[7] and
8th-generation[8] AMD processors. If unaccounted for by the
operating system, this can lead to TSC drift across multiple
processor cores. Also, on current AMD Dual-core processors,
the input voltage and frequency of each core is changed in a
locked-step manner. Modern operating systems are well aware
of the restrictions and effect of P-state changes on the TSC
in current AMD processors and already properly account for
this when using the TSC to derive or interpolate time.
C1-state Change
===============
The power savings from entering C1-state are enhanced by a
feature only recently enabled on multi-processor and
multi-core platforms, C1-clock ramping. This feature
significantly reduces the power consumption of an idle core
that issues the HLT instruction by dividing down its clock
rate relative to its current P-state's input voltage and
clock reference rate. When dividing the core's clock rate
down, C1-clock ramping adjusts the TSC increment so that the
TSC appears to continue incrementing at the undivided clock
reference rate of the current P-state. BIOS enables and
configures the value of the divisor by programming the PMM7
registers in the processor's integrated Northbridge. The
operating system initiates the mechanism by issuing the HLT
instruction. As each core in an AMD Dual-core processor has
its own clock-grid, only the core that issues the HLT is
affected.
The adjustment of a core's TSC increment guards against most
causes of drift. However, in certain circumstances,
C1-clock ramping can still cause TSC drift among the
processor cores. While the clock grid is divided down,
various events, like cache probes, can cause the core grid
to temporarily return to the original rate to process the
event and then eventually go back to the divided rate; the
TSC increment is properly adjusted in each direction.
However, it is the dynamic switching of the size of the
increment as the core clock grid transitions up and down
through its ramping that eventually leads to TSC drift
across multiple processor cores.
TSC drift due to C1-clock ramping can occur only on
8th-generation[8] AMD multi-processor platforms and
uni-processor dual-core platforms. This drift can *not*
occur on single-processor single-core platforms. It is
generally noticeable only when the operating system uses the
TSC as either the only source of time or as a fast timer to
interpolate between periodic timer interrupts. C1-clock
ramping is a recent feature and at this moment is used
mostly by single-processor platforms. On multi-processor
platforms, TSC usage is minimized as most operating systems
prefer HPET[5] or the ACPI PM Timer[6] over TSC. As a
result, this TSC drift has been observed primarily on
single-processor dual-core platforms which do not expose
HPET and which are running an operating system that is using
TSC on that platform. Action is required for these operating
systems as outlined in the "Solutions" section below, but,
fortunately, many of them already provide simple boot
configuration options that allow the TSC to be bypassed
(such as "notsc" and "clock=pmtmr") to work around this
problem.
C2 and C3 State Change
======================
The core-clock grid can be divided up and down when entering
and exiting C2 and C3 states and the TSC is adjusted
accordingly. However, the clock-grid of all cores on a
processor are ramped up and down in lockstep, so the TSC can
never drift between the multiple cores of a single
processor. Furthermore, AMD supports C2 and C3 states only
on an uni-processor mobile system. As a result, entering
and exiting C2 and C3 states does not cause TSC drift among
processor cores.
STPCLK-Throttling
=================
STPCLK-throttling is supported on 8th-generation AMD
uni-processor and multi-processor platforms; it is supported
for 7th-generation processors only on uni-processor
platforms. STPCLK-throttling reduces the power consumption
of the entire platform by dividing down the clock rate of
all processor cores in all processors on the platform. The
southbridge initiates a STPCLK-throttling message to all
processors based on external temperature sensors, timers, or
other external events that have been designed into the
platform. Platform vendors typically use STPCLK-throttling
as a safeguard to quickly cool a platform due to abnormal
thermal conditions that occur on the platform or in the
environment the platform is in. (Unfortunately, no
notification is given to the OS when STPCLK-throttling
occurs and only chipset-specific methods exist to detect
whether a platform is planning on using it.)
The BIOS enables the STPCLK-throttling mechanism in the
southbridge and programs the processor's response to it in
the PMM5 registers of the integrated northbridge. (Many
BIOSes by default program PMM5 for this even if the chipset
is not able or configured to generate the STPCLK-throttling
message.) STPCLK-throttling ramps up and down the clock
grid of all cores on a processor equally, therefore it can
not cause TSC drift on a uni-processor platform.
STPCLK signaling will reach processors in a multi-processor
platform at different times, and each processor can ramp up
and down at different times and different durations than the
others; therefore TSC drift can occur on such platforms.
However as stated earlier, the usage of STPCLK-throttling is
very recent and the possibility is reduced because many
operating systems avoid using TSC on multi-processor
platforms or fallback to using TSC only when the platform
does not expose an HPET. Also, STPCLK-throttling is by
nature an infrequent platform event.
Future TSC Directions and Solutions
===================================
Future AMD processors will provide a TSC that is P-state and
C-State invariant and unaffected by STPCLK-throttling. This
will make the TSC immune to drift. Because using the TSC
for fast timer APIs is a desirable feature that helps
performance, AMD has defined a CPUID feature bit that
software can test to determine if the TSC is
invariant. Issuing a CPUID instruction with an %eax register
value of 0x8000_0007, on a processor whose base family is
0xF, returns "Advanced Power Management Information" in the
%eax, %ebx, %ecx, and %edx registers. Bit 8 of the return
%edx is the "TscInvariant" feature flag which is set when
TSC is P-state, C-state, and STPCLK-throttling invariant; it
is clear otherwise.
The rate of the invariant TSC is implementation-dependent
and will likely *not* be the frequency of the processor
core; however, its period should be short enough such that
it is not possible for two back-to-back rdtsc instructions
to return the same value. Software which is trying to
measure actual processor frequency or cycle-performance
should use Performance Event 76h, CPU Clocks not Halted,
rather than the TSC to count CPU cycles.
Current Solutions to TSC Drift due to C1-clock ramping
======================================================
In general, it is likely that end users should only
experience and notice TSC drift on single-processor
dual-core platforms which do not expose HPET and which are
running an older operating system that is using TSC on that
platform. On such platforms which run Linux, the end user
can correct the problem by specifying the appropriate boot
option switch to bypass the TSC such as "notsc" or
"clock=pmtmr". Equivalent solutions exist for other
operating systems[4].
Until TSC becomes invariant, AMD recommends that operating
system developers avoid TSC as a fast timer source on
affected systems. (AMD recommends that the operating system
should favor these time sources in a prioritized manner:
HPET first, then ACPI PM Timer, then PIT.) The following
pseudo-code shows one way of determining when to use TSC:
use_AMD_TSC() { // returns TRUE if ok to use TSC
if (CPUID.base_family < 0xf) {
// TSC drift doesn't exist on 7th Gen or less
// However, OS still needs to consider effects
// of P-state changes on TSC
return TRUE;
} else if (CPUID.AdvPowerMgmtInfo.TscInvariant) {
// Invariant TSC on 8th Gen or newer, use it
// (assume all cores have invariant TSC)
return TRUE;
} else if ((number_processors == 1)&&(number_cores == 1)){
// OK to use TSC on uni-processor-uni-core
// However, OS still needs to consider effects
// of P-state changes on TSC
return TRUE;
} else if ( (number_processors == 1) &&
(CPUID.effective_family == 0x0f) &&
!C1_ramp_8gen ){
// Use TSC on 8th Gen uni-proc with C1_ramp off
// However, OS still needs to consider effects
// of P-state changes on TSC
return TRUE;
} else {
return FALSE;
}
}
C1_ramp_8gen() {
// Check if C1-Clock ramping enabled in PMM7.CpuLowPwrEnh
// On 8th-Generation cores only. Assume BIOS has setup
// all Northbridges equivalently.
return (1 & read_pci_byte(bus=0,dev=0x18,fcn=3,offset=0x87));
}
When an operating system can not avoid using TSC in the
short-term, the operating system will need to either
re-synchronize the TSC of the halted core when exiting halt
or disable C1-clock ramping. The pseudo-code for disabling
C1-clock ramping follows:
if ( !use_AMD_TSC() &&
(CPUID.effective_family == 0x0f) &&
C1_ramp_8gen() ){
for (i=0; i < number_processors; ++i){
// Do for all NorthBridges in platform
tmp = read_pci_byte(bus=0,dev=0x18+i,fcn=3,offset=0x87);
tmp &= 0xFC; // clears pmm7[1:0]
write_pci_byte(bus=0,dev=0x18+i,fcn=3,offset=0x87,tmp)
}
}
Current Solutions to TSC Drift due to STPCLK-Throttling
=======================================================
TSC drift due to STPCLK-throttling can occur only on
8th-generation AMD multi-processor platforms. Furthermore,
the possibility is greatly reduced because many operating
systems avoid using TSC on multi-processor platforms or
fallback to using TSC only when the platform does not expose
an HPET. Lastly, STPCLK-throttling is an infrequent platform
event. However, end users running Linux, can guard against
the possibility by specifying the appropriate boot option
switch to bypass the TSC such as "notsc" or "clock=pmtmr".
An operating system that can not avoid using TSC on a
multi-processor platform may choose to work around the
possibility of TSC drift due to STPCLK-throttling. It is
very unlikely that the TSC-drift accumulated in 1 second by
asserting and de-asserting STPCLK-throttling is significant.
Therefore an operating system could choose to re-adjust the
TSC value of a processor core relative to an external time
source once a second -- the cores need not be adjusted in
lockstep. This would guard against the possibility of TSC
drift among multiple processor cores.
Footnotes
=========
[1] Throughout this discussion, a processor is defined as a
physical socketed chip package containing one or more
on-die CPU cores; a processor plugs into a socket on a
platform motherboard.
[2] These are described in the "BIOS and Kernel Developer's
Guide for AMD Athlon(tm) 64 and AMD Opteron(tm)
Processors", Publication 26094
[3] TSC drift occurs when the computed (expected) difference
between the TSCs of two cores is no longer a constant
value but varies by a significant amount to the shock
and surprise of the operating system.
[4] 32-bit Windows XP SP2, 64-bit Windows XP, and Windows
2003 SP1 provide the "/usepmtimer" switch in the
boot.ini to override using the TSC on single-processor
dual-core platforms; these operating systems do not rely
upon TSC on multi-processor platforms.
[5] HPET High Precision Event Timer is defined in the
"Advanced Configuration and Power Interface
Specification, Revision 3.0"
[6] ACPI Power Management Timer is defined in the "Advanced
Configuration and Power Interface Specification,
Revision 3.0"
[7] AMD's 7th generation processors return a CPUID base
family value of '7'. These include AMD Athlon, AthlonXP,
AthlonMP, and Duron.
[8] AMD's 8th generation processors return an effective
CPUID family of '0x0F'. These include AMD Opteron,
Athlon64, and Turion.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]