TSC and Power Management Events on AMD Processors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thought you might find this note useful ...

     -------------------------------------------------
     TSC and Power Management Events on AMD Processors
                        Nov 2, 2005
                  Rich Brunner, AMD Fellow
     -------------------------------------------------


Current AMD Opteron(tm)  and Athlon(tm)64 processors provide
power  management mechanisms  that independently  adjust the
performance state ("P-state") and power state ("C-state") of
the  processor[1][2];  these  state  changes  can  affect  a
processor  core's  Time   Stamp  Counter  (TSC)  which  some
operating systems  may use as  a part of their  time keeping
algorithms.  Most modern operating systems are well aware of
the  effect  of these  state  changes  on  the TSC  and  the
potential for  TSC drift[3] across  multiple processor cores
and properly account for  it.  Although cores may drift with
respect to  one another, an individual core's  TSC is always
monotonically  increasing.  This  drift can  *not*  occur on
single-processor single-core platforms.

This  note reviews  a  few corner  cases  that an  Operating
System  should consider  when  using the  TSC  to derive  or
interpolate  time.   It   also  highlights  AMD's  long-term
direction for the TSC.

Applications  should  avoid   using  the  TSC  directly  for
timekeeping  and instead rely  on the  appropriate operating
system  calls.   Using  the   TSC  directly  means  that  an
application  is not  protected from  TSC-drift and  does not
benefit  from   the  logic   in  the  operating   system  to
work-around it; as a result, applications using TSC directly
could get confused by TSC-drift.



P-state Changes
===============

P-state changes are performed by changing a processor core's
input voltage  and clock  reference rate; thus  this changes
the rate at which the  TSC increments. The effect of P-state
changes  on  the TSC  exists  in  all 7th-generation[7]  and
8th-generation[8] AMD processors.  If unaccounted for by the
operating system, this can lead to TSC drift across multiple
processor cores. Also,  on current AMD Dual-core processors,
the input voltage and frequency of each core is changed in a
locked-step manner.  Modern operating systems are well aware
of the restrictions and effect of P-state changes on the TSC
in current  AMD processors and already  properly account for
this when using the TSC to derive or interpolate time.




C1-state Change
===============

The power  savings from entering C1-state are  enhanced by a
feature  only   recently  enabled  on   multi-processor  and
multi-core   platforms,  C1-clock  ramping.    This  feature
significantly reduces the power  consumption of an idle core
that issues  the HLT instruction by dividing  down its clock
rate  relative to  its current  P-state's input  voltage and
clock reference  rate.  When dividing the  core's clock rate
down, C1-clock ramping adjusts the TSC increment so that the
TSC appears to continue  incrementing at the undivided clock
reference  rate of  the current  P-state.  BIOS  enables and
configures the value of  the divisor by programming the PMM7
registers  in the  processor's integrated  Northbridge.  The
operating system initiates the  mechanism by issuing the HLT
instruction.  As each core in an AMD Dual-core processor has
its own  clock-grid, only  the core that  issues the  HLT is
affected.

The adjustment of a core's TSC increment guards against most
causes  of   drift.   However,  in   certain  circumstances,
C1-clock  ramping  can  still  cause  TSC  drift  among  the
processor  cores.  While  the  clock grid  is divided  down,
various events,  like cache probes, can cause  the core grid
to temporarily  return to the  original rate to  process the
event and then  eventually go back to the  divided rate; the
TSC  increment  is  properly  adjusted  in  each  direction.
However,  it is  the dynamic  switching of  the size  of the
increment  as the core  clock grid  transitions up  and down
through  its  ramping that  eventually  leads  to TSC  drift
across multiple processor cores.

TSC  drift  due  to  C1-clock  ramping  can  occur  only  on
8th-generation[8]   AMD    multi-processor   platforms   and
uni-processor  dual-core platforms.   This  drift can  *not*
occur  on  single-processor  single-core platforms.   It  is
generally noticeable only when the operating system uses the
TSC as either the only source  of time or as a fast timer to
interpolate  between  periodic  timer interrupts.   C1-clock
ramping  is a  recent feature  and  at this  moment is  used
mostly  by single-processor  platforms.   On multi-processor
platforms, TSC usage is  minimized as most operating systems
prefer  HPET[5] or  the ACPI  PM  Timer[6] over  TSC.  As  a
result,  this  TSC  drift  has been  observed  primarily  on
single-processor  dual-core platforms  which  do not  expose
HPET and which are running an operating system that is using
TSC on that platform. Action is required for these operating
systems as  outlined in the "Solutions"  section below, but,
fortunately,  many  of  them  already  provide  simple  boot
configuration  options that  allow  the TSC  to be  bypassed
(such  as "notsc"  and  "clock=pmtmr") to  work around  this
problem.




C2 and C3 State Change
======================

The core-clock grid can be divided up and down when entering
and  exiting  C2 and  C3  states  and  the TSC  is  adjusted
accordingly.   However, the  clock-grid  of all  cores on  a
processor are ramped up and down in lockstep, so the TSC can
never  drift   between  the  multiple  cores   of  a  single
processor.  Furthermore, AMD supports  C2 and C3 states only
on an  uni-processor mobile  system.  As a  result, entering
and exiting C2 and C3  states does not cause TSC drift among
processor cores.




STPCLK-Throttling
=================

STPCLK-throttling   is  supported   on   8th-generation  AMD
uni-processor and multi-processor platforms; it is supported
for   7th-generation   processors   only  on   uni-processor
platforms.  STPCLK-throttling  reduces the power consumption
of the  entire platform by  dividing down the clock  rate of
all processor cores in  all processors on the platform.  The
southbridge  initiates a  STPCLK-throttling  message to  all
processors based on external temperature sensors, timers, or
other  external  events that  have  been  designed into  the
platform.  Platform  vendors typically use STPCLK-throttling
as a  safeguard to quickly  cool a platform due  to abnormal
thermal  conditions that  occur on  the platform  or  in the
environment   the  platform   is  in.    (Unfortunately,  no
notification  is  given  to  the OS  when  STPCLK-throttling
occurs  and only  chipset-specific methods  exist  to detect
whether a platform is planning on using it.)

The  BIOS  enables the  STPCLK-throttling  mechanism in  the
southbridge and  programs the processor's response  to it in
the  PMM5 registers  of the  integrated  northbridge.  (Many
BIOSes by default program PMM5  for this even if the chipset
is not able or  configured to generate the STPCLK-throttling
message.)   STPCLK-throttling ramps  up and  down  the clock
grid of all  cores on a processor equally,  therefore it can
not cause TSC drift on a uni-processor platform.

STPCLK signaling will  reach processors in a multi-processor
platform at different times,  and each processor can ramp up
and down at different times and different durations than the
others;  therefore TSC  drift can  occur on  such platforms.
However as stated earlier, the usage of STPCLK-throttling is
very  recent and  the  possibility is  reduced because  many
operating  systems   avoid  using  TSC   on  multi-processor
platforms or  fallback to using  TSC only when  the platform
does  not expose  an  HPET.  Also,  STPCLK-throttling is  by
nature an infrequent platform event.




Future TSC Directions and Solutions
===================================

Future AMD processors will provide a TSC that is P-state and
C-State invariant and unaffected by STPCLK-throttling.  This
will make  the TSC immune  to drift.  Because using  the TSC
for  fast  timer APIs  is  a  desirable  feature that  helps
performance,  AMD  has  defined  a CPUID  feature  bit  that
software   can   test   to   determine   if   the   TSC   is
invariant. Issuing a CPUID instruction with an %eax register
value of  0x8000_0007, on a  processor whose base  family is
0xF, returns "Advanced  Power Management Information" in the
%eax, %ebx, %ecx,  and %edx registers.  Bit 8  of the return
%edx is  the "TscInvariant" feature  flag which is  set when
TSC is P-state, C-state, and STPCLK-throttling invariant; it
is clear otherwise.

The  rate of the  invariant TSC  is implementation-dependent
and  will likely  *not* be  the frequency  of  the processor
core; however,  its period should be short  enough such that
it is  not possible for two  back-to-back rdtsc instructions
to  return the  same  value.  Software  which  is trying  to
measure  actual  processor  frequency  or  cycle-performance
should  use Performance  Event 76h,  CPU Clocks  not Halted,
rather than the TSC to count CPU cycles.




Current Solutions to TSC Drift due to C1-clock ramping 
======================================================

In  general,  it  is  likely  that  end  users  should  only
experience   and  notice   TSC  drift   on  single-processor
dual-core platforms  which do not expose HPET  and which are
running an older operating system  that is using TSC on that
platform. On  such platforms which  run Linux, the  end user
can correct  the problem by specifying  the appropriate boot
option  switch  to  bypass   the  TSC  such  as  "notsc"  or
"clock=pmtmr".    Equivalent  solutions   exist   for  other
operating systems[4].

Until TSC  becomes invariant, AMD  recommends that operating
system  developers  avoid TSC  as  a  fast  timer source  on
affected systems. (AMD  recommends that the operating system
should  favor these  time sources  in a  prioritized manner:
HPET first,  then ACPI PM  Timer, then PIT.)   The following
pseudo-code shows one way of determining when to use TSC:

 use_AMD_TSC() { // returns TRUE if ok to use TSC

   if (CPUID.base_family < 0xf) {
             // TSC drift doesn't exist on 7th Gen or less
             // However, OS still needs to consider effects
             // of P-state changes on TSC
             return TRUE;

   } else if (CPUID.AdvPowerMgmtInfo.TscInvariant) {
             // Invariant TSC on 8th Gen or newer, use it
             // (assume all cores have invariant TSC)
             return TRUE;

   } else if ((number_processors == 1)&&(number_cores == 1)){
             // OK to use TSC on uni-processor-uni-core
             // However, OS still needs to consider effects
             // of P-state changes on TSC
             return TRUE;

   } else if ( (number_processors == 1) && 
               (CPUID.effective_family == 0x0f) &&
               !C1_ramp_8gen                       ){
             // Use TSC on 8th Gen uni-proc with C1_ramp off 
             // However, OS still needs to consider effects
             // of P-state changes on TSC
             return TRUE;

   } else {
             return FALSE;
   }

 }

 C1_ramp_8gen() {
    // Check if C1-Clock ramping enabled in  PMM7.CpuLowPwrEnh
    // On 8th-Generation cores only. Assume BIOS has setup
    // all Northbridges equivalently.

    return (1 & read_pci_byte(bus=0,dev=0x18,fcn=3,offset=0x87));
 }




When  an operating  system can  not avoid  using TSC  in the
short-term,  the  operating   system  will  need  to  either
re-synchronize the TSC of  the halted core when exiting halt
or disable C1-clock  ramping.  The pseudo-code for disabling
C1-clock ramping follows:

 if ( !use_AMD_TSC() && 
      (CPUID.effective_family == 0x0f) &&
      C1_ramp_8gen()                       ){

    for (i=0; i < number_processors; ++i){
       // Do for all NorthBridges in platform
       tmp = read_pci_byte(bus=0,dev=0x18+i,fcn=3,offset=0x87);
       tmp &= 0xFC;    // clears pmm7[1:0]
       write_pci_byte(bus=0,dev=0x18+i,fcn=3,offset=0x87,tmp)
      }

 }



Current Solutions to TSC Drift due to STPCLK-Throttling 
=======================================================

TSC  drift  due  to  STPCLK-throttling  can  occur  only  on
8th-generation AMD  multi-processor platforms.  Furthermore,
the  possibility is greatly  reduced because  many operating
systems  avoid  using TSC  on  multi-processor platforms  or
fallback to using TSC only when the platform does not expose
an HPET. Lastly, STPCLK-throttling is an infrequent platform
event. However,  end users running Linux,  can guard against
the  possibility by specifying  the appropriate  boot option
switch to bypass the TSC such as "notsc" or "clock=pmtmr".

An  operating system  that  can  not avoid  using  TSC on  a
multi-processor  platform  may  choose  to work  around  the
possibility of  TSC drift  due to STPCLK-throttling.   It is
very unlikely that the  TSC-drift accumulated in 1 second by
asserting and de-asserting STPCLK-throttling is significant.
Therefore an operating system  could choose to re-adjust the
TSC value of  a processor core relative to  an external time
source once  a second -- the  cores need not  be adjusted in
lockstep.  This  would guard against the  possibility of TSC
drift among multiple processor cores.




Footnotes
=========
[1] Throughout this discussion,  a processor is defined as a
    physical  socketed chip package  containing one  or more
    on-die CPU cores;  a processor plugs into a  socket on a
    platform motherboard.

[2] These are described  in the "BIOS and Kernel Developer's
    Guide  for   AMD  Athlon(tm)  64   and  AMD  Opteron(tm)
    Processors", Publication 26094

[3] TSC drift occurs when the computed (expected) difference
    between the  TSCs of two  cores is no longer  a constant
    value but  varies by a  significant amount to  the shock
    and surprise of the operating system.

[4] 32-bit  Windows XP SP2,  64-bit Windows XP,  and Windows
    2003  SP1  provide   the  "/usepmtimer"  switch  in  the
    boot.ini to  override using the  TSC on single-processor
    dual-core platforms; these operating systems do not rely
    upon TSC on multi-processor platforms.

[5] HPET  High Precision  Event  Timer  is  defined in  the
    "Advanced    Configuration     and    Power    Interface
    Specification, Revision 3.0"

[6] ACPI Power Management  Timer is defined in the "Advanced
    Configuration   and   Power   Interface   Specification,
    Revision 3.0"

[7] AMD's 7th  generation  processors return  a CPUID  base
    family value of '7'. These include AMD Athlon, AthlonXP,
    AthlonMP, and Duron.

[8] AMD's  8th generation  processors  return an  effective
    CPUID family of '0x0F'. These include AMD Opteron, 
    Athlon64, and Turion.
 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux