On Jul 06, 2007, at 11:42:33, Alan Stern wrote:
On Thu, 5 Jul 2007, Kyle Moffett wrote:
Umm, this thread is NOT ABOUT HIBERNATING!!! Please go back and
read the subject, specifically the "suspend to RAM" parts :-D.
But it _is_ about the freezer; see the "Remove process freezer"
part. :-) Since the freezer is used during hibernation,
hibernation is a legitimate topic.
Except Linus already decreed (and I heartily agree) that hibernation
and suspend-to-RAM were fundamentally completely different operations
and therefore any attempts to share code were basically just making a
big muddy mess of things. Would a thread "Remove phase-of-the-moon
calculations from network-recv code" be relevant to lunar observation
just because the two had to do with the phases of the moon? No!
When your hardware can put itself to sleep and atomically preserve
memory as it does so, you don't need an atomic copy. For Real
Suspend(TM) (IE: Suspend-to-RAM), the list of things to do is
short and simple:
1) Stop DMA and put most hardware into low-power states (stops
all interrupt sources)
2) Ensure that the other CPUs have finished any trailing
interrupt handlers and put them to sleep
3) Put the interrupt-controllers into low-power stat
4) Go to sleep
Your short and simple list omits a few crucial items:
A) Decide what to do about remote wakeup requests.
Why do we care? If the wakeup request arrives before we go to sleep,
we obviously aren't asleep and so can't wake up. If it arrives after
we go to sleep then it will wake us up. Anything that depends on a
wakeup arriving mid-sequence is 100% masochistic race condition.
B) Prevent I/O requests from resuming devices that have been
suspended.
(1a) As I describe below, step (1) includes setting NO_BIND and NO_IO
flags on devices as they are processed. Anybody who wants to do IO
while those flags are set should just go sleep on a waitqueue.
C) Prevent devices and drivers from being registered or
unregistered; in particular decide what to do about hot-plug or hot-
unplug events.
(1b) Again, that's where the NO_BIND flag comes in. If its set then
any device probe events must sleep, otherwise they can go through.
D) Block driver bind or unbind calls.
See points (1a) and (1b) above.
Any of these things is capable of screwing up the course of
events. (In fact A _should_ be allowed to abort a suspend.)
If any of those things screw up suspend-to-RAM then it is 100% the
drivers fault and no "process freezer" is going to fix it, end of
story. And "A" cannot be made reliable. At some point you shut off
interrupts right before going to sleep, and at that point any remote
wakeup event is just going to get dropped until you actually enter
sleep mode and the hardware takes over again. If you miss a wakeup
event then whatever sent it should just retry, just as with *every*
other kind of network packet.
How about a freezer whose job it is to "wait for pending hard
interrupts to complete when we have already guaranteed that we
won't get any more"? That part should be really *REALLY* easy.
You don't need to care about either userspace processes or kernel
threads at all. Specifically, Step 1 consists of:
suspend_device(dev)
{
set_no_bind_flag(dev);
for (dev->subdevices)
suspend_device(dev);
set_no_io_flag(dev);
wait_for_in_progress_dma(dev);
turn_off_interrupts(dev);
go_to_low_power_state(dev);
}
After you've set the "no_bind" flag, you won't get any *new*
subdevices trying to bind,
So what happens if a new subdevice arrives at the wrong time? Do
you block instead of binding it? While holding a mutex needed to
suspend the parent device?
That would be a driver bug. If you have asynchronous probing then
proper suspend handling includes being able to postpone driver probe
events until after resume. If you have synchronous probing then the
problem doesn't exist because "set_no_bind_flag" is just telling the
device not to raise any more device probe interrupts.
What about drivers trying to bind to existing devices?
While binding it will clearly be holding a mutex/spinlock on the
parent device, so the suspend process will wait for it. When binding
is done the suspend_device() code will take the device lock and tell
everything else to postpone further bind requests as above.
What happens to I/O requests submitted after the "no_io" flag is
set? The driver will have to block them, effectively creating its
own little "freezer".
Oh, so you're calling every waitqueue in the kernel a "freezer" now?
We do these things at the driver level *all* *the* *time*. For
instance, you can't submit new IOs to an ATA controller while it's
renegotiating the bus speed, but that's never been a problem before.
When all the leaf devices are off, the parent device can be turned
off because everything waiting on the leaf devices is blocked on
them and won't unblock until the parent device *AND* the leaf
device are turned on again, in that order.
This is a lot like what we already do. The differences are:
There is nothing corresponding to your "no-bind" flag.
Most drivers don't have anything like your "no_io" flag; they
assume that nobody will be around to submit an I/O request.
Most drivers have an implicit NO_BIND flag: The device's interrupts
are off and/or its in a low-power state. USB is already terribly
buggy with regards to suspend: If you hotplug a device during
suspend (like the touchpad in my powerbook powering down/up), then
the USB stack will basically hang that controller. The device is off
and the hotplug triggers interrupts and IO, *EVEN* *WITHOUT*
*USERSPACE*.
So if your driver doesn't already have a proper way of blocking IO
during suspend then it probably doesn't suspend 50% of the time
anyways. A bug which bites *every* *time* is easy to fix, one which
only bites when things hit a race condition is much harder.
Resuming is basically running the whole process in reverse.
Runtime-suspend is achieved by not setting the 'no_io' or
'no_bind' flags and putting selective device-subtrees to sleep
without doing anything to the rest of the system.
Nobody doubts that suspend can be made to work without the freezer.
The point is that doing it this way dumps a bunch of extra
responsibility on drivers.
That responsibility has been there ever since suspend-to-RAM support
was added. Nobody ever denied that writing a proper driver wasn't
tricky. You have to simultaneously be able to handle handle hot-
unplug, IO errors, interrupts, IO requests, suspend-to-RAM, and
hibernation. If your driver mutual-exclusion is buggy then it
probably already bites you during hotplug or other similar
scenarios. Let's at least make the problems much more reproducible
so we can fix the drivers properly instead of continuing to kludge
around it for all eternity.
Cheers,
Kyle Moffett
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]