VMI Interface Proposal Documentation for I386, Part 3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




  Time Interface.

    In a virtualized environment, virtual machines (VM) will time share
    the system with each other and with other processes running on the
    host system.  Therefore, a VM's virtual CPUs (VCPUs) will be
    executing on the host's physical CPUs (PCPUs) for only some portion
    of time.  This section of the VMI exposes a paravirtual view of
    time to the guest operating systems so that they may operate more
    effectively in a virtual environment.  The interface also provides
    a way for the VCPUs to set alarms in this paravirtual view of time.

    Time Domains:

    a) Wallclock Time:

    Wallclock time exposed to the VM through this interface indicates
    the number of nanoseconds since epoch, 1970-01-01T00:00:00Z (ISO
    8601 date format).  If the host's wallclock time changes (say, when
    an error in the host's clock is corrected), so does the wallclock
    time as viewed through this interface.

    b) Real Time:

    Another view of time accessible through this interface is real
    time.  Real time always progresses except for when the VM is
    stopped or suspended.  Real time is presented to the guest as a
    counter which increments at a constant rate defined (and presented)
    by the hypervisor.  All the VCPUs of a VM share the same real time
    counter.

    The unit of the counter is called "cycles".  The unit and initial
    value (corresponding to the time the VM enters para-virtual mode)
    are chosen by the hypervisor so that the real time counter will not
    rollover in any practical length of time.  It is expected that the
    frequency (cycles per second) is chosen such that this clock
    provides a "high-resolution" view of time.  The unit can only
    change when the VM (re)enters paravirtual mode.

    c) Stolen time and Available time:

    A VCPU is always in one of three states: running, halted, or ready.
    The VCPU is in the 'running' state if it is executing.  When the
    VCPU executes the HLT interface, the VCPU enters the 'halted' state
    and remains halted until there is some work pending for the VCPU
    (e.g. an alarm expires, host I/O completes on behalf of virtual
    I/O).  At this point, the VCPU enters the 'ready' state (waiting
    for the hypervisor to reschedule it).  Finally, at any time when
    the VCPU is not in the 'running' state nor the 'halted' state, it
    is in the 'ready' state.

    For example, consider the following sequence of events, with times
    given in real time:

    (Example 1)

    At 0 ms, VCPU executing guest code.
    At 1 ms, VCPU requests virtual I/O.
    At 2 ms, Host performs I/O for virtual I/0.
    At 3 ms, VCPU executes VMI_Halt.
    At 4 ms, Host completes I/O for virtual I/O request.
    At 5 ms, VCPU begins executing guest code, vectoring to the interrupt
             handler for the device initiating the virtual I/O.
    At 6 ms, VCPU preempted by hypervisor.
    At 9 ms, VCPU begins executing guest code.

    From 0 ms to 3 ms, VCPU is in the 'running' state.  At 3 ms, VCPU
    enters the 'halted' state and remains in this state until the 4 ms
    mark.  From 4 ms to 5 ms, the VCPU is in the 'ready' state.  At 5
    ms, the VCPU re-enters the 'running' state until it is preempted by
    the hypervisor at the 6 ms mark.  From 6 ms to 9 ms, VCPU is again
    in the 'ready' state, and finally 'running' again after 9 ms.

    Stolen time is defined per VCPU to progress at the rate of real
    time when the VCPU is in the 'ready' state, and does not progress
    otherwise.  Available time is defined per VCPU to progress at the
    rate of real time when the VCPU is in the 'running' and 'halted'
    states, and does not progress when the VCPU is in the 'ready'
    state.

    So, for the above example, the following table indicates these time
    values for the VCPU at each ms boundary:

    Real time    Stolen time    Available time
     0            0              0
     1            0              1
     2            0              2
     3            0              3
     4            0              4
     5            1              4
     6            1              5
     7            2              5
     8            3              5
     9            4              5
    10            4              6

    Notice that at any point:
       real_time == stolen_time + available_time

    Stolen time and available time are also presented as counters in
    "cycles" units.  The initial value of the stolen time counter is 0.
    This implies the initial value of the available time counter is the
    same as the real time counter.

    Alarms:

    Alarms can be set (armed) against the real time counter or the
    available time counter. Alarms can be programmed to expire once
    (one-shot) or on a regular period (periodic).  They are armed by
    indicating an absolute counter value expiry, and in the case of a
    periodic alarm, a non-zero relative period counter value.  [TBD:
    The method of wiring the alarms to an interrupt vector is dependent
    upon the virtual interrupt controller portion of the interface.
    Currently, the alarms may be wired as if they are attached to IRQ0
    or the vector in the local APIC LVTT.  This way, the alarms can be
    used as drop in replacements for the PIT or local APIC timer.]

    Alarms are per-vcpu mechanisms.  An alarm set by vcpu0 will fire
    only on vcpu0, while an alarm set by vcpu1 will only fire on vcpu1.
    If an alarm is set relative to available time, its expiry is a
    value relative to the available time counter of the vcpu that set
    it.

    The interface includes a method to cancel (disarm) an alarm.  On
    each vcpu, one alarm can be set against each of the two counters
    (real time and available time).  A vcpu in the 'halted' state
    becomes 'ready' when any of its alarm's counters reaches the
    expiry.

    An alarm "fires" by signaling the virtual interrupt controller.  An
    alarm will fire as soon as possible after the counter value is
    greater than or equal to the alarm's current expiry.  However, an
    alarm can fire only when its vcpu is in the 'running' state.

    If the alarm is periodic, a sequence of expiry values,

     E(i) = e0 + p * i ,  i = 0, 1, 2, 3, ...

    where 'e0' is the expiry specified when setting the alarm and 'p'
    is the period of the alarm, is used to arm the alarm.  Initially,
    E(0) is used as the expiry.  When the alarm fires, the next expiry
    value in the sequence that is greater than the current value of the
    counter is used as the alarm's new expiry.

    One-shot alarms have only one expiry.  When a one-shot alarm fires,
    it is automatically disarmed.

    Suppose an alarm is set relative to real time with expiry at the 3
    ms mark and a period of 2 ms.  It will expire on these real time
    marks: 3, 5, 7, 9.  Note that even if the alarm does not fire
    during the 5 ms to 7 ms interval, the alarm can fire at most once
    during the 7 ms to 9 ms interval (unless, of course, it is
    reprogrammed).

    If an alarm is set relative to available time with expiry at the 1
    ms mark (in available time) and with a period of 2 ms, then it will
    expire on these available time marks: 1, 3, 5.  In the scenario
    described in example 1, those available time values correspond to
    these values in real time: 1, 3, 6.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux