On Jul 20, 2007, at 1:17 PM, Alan Stern wrote:
On Fri, 20 Jul 2007, Milton Miller wrote:
On Jul 20, 2007, at 11:20 AM, Alan Stern wrote:
On Fri, 20 Jul 2007, Milton Miller wrote:
We can't do this unless we have frozen tasks (this way, or another)
before
carrying out the entire operation.
What can't we do? We've already worked with the drivers to quesce
the
hardware and put any information to resume the device in ram. Now
we
ask them to put their device in low power mode so we can go to
sleep.
Even if we schedule, the only thing userspace could touch is memory.
Userspace can submit I/O requests. Someone will have to audit every
driver to make sure that such I/O requests don't cause a quiesced
device to become active. If the device is active, it will make the
memory snapshot inconsistent with the on-device data.
If a driver is waking a device between the time it was told by
hibernation "suspend all operations and save your device state to ram"
and "resume your device" then it is a buggy driver.
That's exactly my point. As far as I know nobody has done a survey,
but I bet you'd find _many_ drivers are buggy either in this way or the
converse (forcing an I/O request to fail immediately instead of waiting
until the suspend is over when it could succeed). They have this bug
because they were written -- those which include any suspend/resume
support at all -- under the assumption that they could rely on the
freezer.
And that's why Rafael said "We can't do this unless we have frozen
tasks (this way, or another) before carrying out the entire operation."
Until the drivers are fixed -- which seems like a tremendous job --
none of this will work.
So this is in the way of removing the freezer ... but as we are not
relying on doing any io other than suspend device operation, save state
to ram, then later put device in low power mode for s3 and/or s4, and
finally restore and resume to running.
I argue the process can make the io request after we write to disk, we
just can't service it. If we are suspended it will go to the request
queue, and eventually the process will wait for normal throttling
mechanisms until the driver is woken up.
Many drivers don't have request queues. Even for the ones that do,
there are I/O pathways that bypass the queue (think of ioctl or sysfs).
So its not a flag in make_request, fine.
Actually, my point was more "what kernel services do the drivers need
to transition from quiesced to low power for acpi S4 or
suspend-to-ram"? We can't give them allocate-memory (but we give them
a call "we are going to suspend" when they can), but does "run this
tasklet" help? What timer facilities are needed?
Some drivers need the ability to schedule. Some will need the ability
to allocate memory (although GFP_ATOMIC is probably sufficient). Some
will need timers to run.
Can they allocate the memory in advance? (Call them when we know we
want to suspend, they make the allocations they will need; we later
call them again to release the allocations).
If you need timers, you probably want some scheduling?
Do we need to differentate init (por by bios) and resume from quiesced
(for reboot, kexec start/resume)? I hope not.
Yes we do.
can you elabrate? Note I was not asking resume-from-low power vs
init-from-por. We still get that distinction.
How do these drivers work today when we kexec?
The reason I'm asking is its hard to tell the first kernel what
happened. We can say "we powered off, and we were restarted", but it
becomes much harder when each device may or may not have a driver in
the save kernel if we have to differentate for each device if it was
initialized and later quiesced by the jump kernel during save or never
touched. And we need to tell the resume from hybernate code "i touched
it" "no i didn't" and "we resumed from s4" "no it was from s5".
This is why I've been proposing that we don't create the suspend image
with devices in the low power state, but only in a quiesced state
similar to the initial state.
I'm proposing a sequence like:
(1) start allocating pinned memory to reduce saved image size
(2) allocate and map blocks to save maximum image (we know how much ram
is not in 1, so the max size)
(3) tell drivers we are going to suspend. userspace is still running,
swaping still active, etc. now is the time to allocate memory to save
device state.
(4) do what we want to slow down userspace making requests (ie run
freezer today)
(5) call drivers while still scheduling with interrupts "save
oppertunitiy". From this point, any new request should be queued or
the process put on a wait queue.
(6) suspend timers, turn off interrupts
(7) call drivers with interrupts off (final save)
(8) jump to other kernel to save the image
(9) call drivers to transition to low power
(10) finish operations to platform suspend on hybernate
(11) call drivers to resume, telling them if from suspend-to-ram or
suspend-to-disk, possibly in two stages (interrupts off no scheduling
and interrupts on scheduling allowed)
(12) unfreeze processes, kill the the thread holding the extra memroy
used to reserve
So I'm asking what needs to happen in 9. If we have to turn interrupts
on and schedule, that's ok. If the low power state is the initial
state then fine.
Note that in 11, we could further differentate "from image restore in
S4" and "from image restore in S5", and "from failed image save", but
what needs to happen differently?
I'm guessing that the work that will take some time is seperating the
go to low power from quiesce operations for snapshot, as it sounds like
this is done with one driver call today? Making this separation will
give us our driver audit :-), but only if we decide on the requiements
before the start.
miton
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]