Re: [patch 00/10] PI-futex: -V1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Esben Nielsen <[email protected]> wrote:

> On Sun, 26 Mar 2006, Ingo Molnar wrote:
> 
> > i mentioned it further down in the text - PRIO_PROTECT support (which is
> > priority ceiling) is planned for pthread mutexes. It needs no further
> > kernel changes, it's a pure userspace thing.
> 
> Wouldn't this always include a call to sched_setscheduler() even for 
> the fast path? And it would also involve assigning a priority to all 
> locks up front.

correct. Priority ceiling based boosting can be called "manual PI" or 
"open-coded PI". It is quite error-prone for more complex apps, because 
not only has each thread its own priority (which must be well-designed), 
but also every lock used by these threads must be assigned a maximum 
priority that it will boost threads to. That priority is redundant and 
can be mis-assigned.

priority ceiling locks also involve two additional syscalls per critical 
section, so they are significantly slower than the atomic-ops based 
PI-futex solution. [and thus it could degrade worst-case behavior by 
virtue of higher locking overhead alone.]

> There are only 2 good reasons to choose this, as far as I can see. One 
> is that it is more deterministic: The "fast path" is almost as slow as 
> the slow path. So you will not be surprised by a sudden increase CPU 
> use because timing is moved slightly. This is on the other hand 
> something which can happen with PI.

correct. That on the other hand is also a disadvantage: PI is in essence 
'optional boosting on an as-needed basis', while priority-ceiling is 
unconditional boosting. Unconditional boosting will hurt average 
latencies, which are often just as important as the worst-case 
latencies.

For example: there is a high-prio, a medium-prio and a low-prio task.  
There is a critical section lock, only used by the high-prio and the 
low-prio task.

Under the priority ceiling logic, we manually assign 'high-prio' to the 
lock [which step is redundant information, and which is an additional 
source of coding/design bugs]. When the low-prio task takes the lock, 
that will unconditionally boost it to high-prio, and that task might 
delay the medium-prio task - even if there is no high-prio task running 
right then.

If PI is used, then the kernel will automatically boost the low-prio 
task to high-prio, if the high-prio task wants to take that lock too.  
Otherwise the low-prio task will remain on low-prio - not delaying the 
medium-prio task. So the average delays of the medium-prio task improve.  

[ Furthermore, theoretically, if the application has a workflow pattern 
  in which the medium-prio and high-prio tasks will never run at the 
  same time, then priority ceiling logic would degrade the worst-case 
  latency of the medium-prio task too. ]

[ for completeness: PI is transitive as well, so if at such a moment the 
  highprio task preempts the medium prio task (which preempted the 
  lowprio task holding the lock), and the highprio task wants to take 
  the lock, then the lowprio task will be boosted to high-prio and can 
  finish the critical section to hand over the lock to the highprio 
  task. ]

i.e. PI, as implemented by this patchset, is clearly superior to 
priority ceiling logic.

furthermore, there's a usability advantage to PI too: programmers often 
find it more natural to assign priorities to a given 'workflow' 
(task/thread), and not to every lock. I.e. i think PI is more natural. 
(and hence easier to design for)

but nevertheless, the PRIO_PROTECT API for POSIX mutexes does exist, and 
we can (and will) support it from userspace. All that it needed was to 
make sure that setscheduler() is fully PI-aware: it must not decrease 
the priority of a task to below the highest-priority waiter, and it must 
'remember' the setscheduler() priority [and restore to it] after PI 
effects wear off. This is implemented by our PI code anyway, so we get 
correct priority ceiling (PRIO_PROTECT) behavior 'for free'.

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux