Re: [linux-pm] Re: Hibernation considerations

On Sat, 21 Jul 2007, Alan Stern wrote:

On Fri, 20 Jul 2007 david@lang.hm wrote:

How would you prevent tasks from being scheduled?  How would you
prevent drivers from deadlocking because in order to put their device
in a low-power state they need to acquire a lock which is held by a
user task?

you give up on the suspend becouse you have no way of getting the user
task to give up the lock.

Once the deadlock has occurred it's too late.  You can't give up; in
fact you can't do anything at all.  The system has hung.

however, kernel locks should not be held by user tasks, user tasks are not
expected to behave in rational ways, allowing them to compete with kernel
tasks for locks is a sure way to get a deadlock or indefinate stall.

What on Earth are you talking about?  "Kernel locks should not be held
by user tasks"?  Then who _should_ hold them?  You are aware, I hope,
that down() and mutex_lock() can be called only in process context?

what locks are accessed this way?

Lots of them.  For example, most drivers won't want a suspend to occur
right in the middle of an I/O transfer.  To prevent this, the driver
might use a mutex.  The task doing the I/O (which will be a user task)
acquires the mutex during a transfer and the suspend routine acquires
the mutex while quiescing the device.

wait a min her, it's possible we are misunderstanding each other.

as I see it.

if userspace can aquire locks that prevent the kernel from shutting off(or doing anything else in particular) then it's possible for misbehavinguserspace code to stop the kernel by simply choosing to never release thelock.

this would be a trivial DOS from userspace.

now, if you are talking instead about the fact that when userspace makes asystem call, the execution of that system call involves aquiring locksthat are released before the system call completes you have a verydifferent situation.

if you have locks that are held across system calls then you shouldalready have problems. becouse you can't count on userspace ever takingwhatever action is appropriate to release the lock.

what am I missing that concerns you so much?

Does it really (fundamentally) require scheduling tasks, particularly in
the case that the devices have already been put in the "quiesced" state?

I can't say for sure.  That's the way we have been doing it.  It
wouldn't be easy to change, because the driver would have to busy-wait
during delays -- which would mean it would need to use different code
for system-wide suspend and runtime suspend.

please define terms so that we are all on the same page

Please read Documentation/power/devices.txt.

I have done so.

what do you mean by
system-wide suspend

That's what you would call standby, suspend-to-RAM, or hibernate.  The
entire system goes to sleep.

runtime suspend

That's when an individual device is placed in a low-power state to
save energy while it isn't being used.  The system as a whole remains
awake and the device will be resumed the next time it is needed for
anything.

thanks for the defintitions.

having read through Documentation/power/devices.txt I remain convincedthat you are making a fundamental mistake.

you are designing a system that will only work if everything (everydriver, every state transition) participates fully in the process at alltimes. You started with the facts 'this is the info that ACPI provides andthis is how it is designed to be used' and worked from there instead oflooking to see what the kernel really needed and figuring how to provide agood interface for that that happens to be implemented (today) with ACPI.(a proper power management framework shouldn't care if you have ACPI, APM,or some other method of controlling the devices)

this leads to resume functions that can only work if the proper suspendfunction was called rather then makeing 'resume' just mean 'go to fulloperation', which is the same thing that gets called when the device isfirst initialized. internally it can examine the hardware and followdifferent paths depending on what it finds the current state of thehardware is, but the outside world (including the rest of the kernel)should not care. the fact that the rest of the kernel needs to know if itshould call 'resume' or 'initialize' is a failure in the abstraction.

in fact, a better abstraction would be something like

report_power_modes
  which would return a series of modes (sorted only by modeID)
  modeID, %power_used_in_this_mode, %capability_in_this_mode

(I would make mode 0 always be complete power off, and mode 1 always befull capacity)

report_power_mode_speed

which would return a matrix giving how long it takes to transition fromany mode to any other mode. this should be a relative number, not anabsolute number since it will be different at different clock speeds.

set_operational_mode(modeID)

which would take you from whatever mode you are in now to the requestedmode.

most devices would report the simple list of modes

0,0,0
1,100,100

with a mode_speed matrix of
  0 1
  ---
0|0 1
1|1 0

it may be that there is more info needed for the powr management engine todecide what modes it wants to put things into, if so identify what type ofinfo you need and add another column to the modes list.

for example:

you may want to add a flag for 'does this mode allow downstream devicesto operate?'you may want to make a mode for 'this mode doesn't allow any newrequests, but continues to process pending requests' and have a flag thatindicates this

currently it looks like there's no way to find out what modes areavailable, and you have to know what mode something is in currently beforeyou can request it change to a different mode. both of these preventeffective power management without encoding intimate knowledge of thecapability of the particular hardware in your management tool.

some of this may be discoverable via the ACPI interface (it's not talkedabout much in the devices.txt file), but the mode setting is still wrong.

note that in the example above it's accpetable for a driver to cache whatmode it thinks the device is in, but it needs to properly set the newmode even if it's cached data is incorrect.

this approach would allow the transition of ALL drivers to the new mode ofoperation in one fell swoop, and then adding additional power managementfeatures is just adding to the existing list rather then implementing newfunctions.

David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [linux-pm] Re: Hibernation considerations
  - From: Alan Stern <stern@rowland.harvard.edu>

References:
- Re: [linux-pm] Re: Hibernation considerations
  - From: Alan Stern <stern@rowland.harvard.edu>

Prev by Date: [PATCH] x86_64 vDSO: install unstripped copies on disk
Next by Date: Re: [git patches] two warning fixes
Previous by thread: Re: [linux-pm] Re: Hibernation considerations
Next by thread: Re: [linux-pm] Re: Hibernation considerations
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]