Re: [patch 2/3] mutex subsystem: fastpath inlining

On Tue, 27 Dec 2005, Ingo Molnar wrote:

> 
> * Nicolas Pitre <nico@cam.org> wrote:
> 
> > Some architectures, notably ARM for instance, might benefit from 
> > inlining the mutex fast paths. [...]
> 
> what is the effect on text size? Could you post the before- and 
> after-patch vmlinux 'size kernel/test.o' output in the nondebug case, 
> with Arjan's latest 'convert a couple of semaphore users to mutexes' 
> patch applied? [make sure you've got enough of those users compiled in, 
> so that the inlining cost is truly measured. Perhaps also do 
> before/after 'size' output of a few affected .o files, without mixing 
> kernel/mutex.o into it, like vmlinux does.]

Theory should be convincing enough.  First of all, all the semaphore 
fast paths are always inlined currently, on all architectures I've 
looked at.  A down() fast path is always looking like this:

        mrs     ip, cpsr
        orr     lr, ip, #128
        msr     cpsr_c, lr
        ldr     lr, [r0]
        subs    lr, lr, #1
        str     lr, [r0]
        msr     cpsr_c, ip
        movmi   ip, r0
        blmi    __down_failed

So our starting point for comparison is 9 instructions for every down() 
occurence in the kernel.  Same thing for up().  Every instruction is 
invariably 4 bytes.

Now let's look at the typical mutex_lock():

        mov     r4, #0
        swp     r3, r4, [r0]
        cmp     r3, #1
        blne      __mutex_lock_noinline

This is 4 instructions.  Further more, the first "mov r4, #0" can often 
be eliminated when gcc can cse the constant 0 from another 
register.  We're talking about 3 instructions then, down from 9 !

We therefore saves between 20 and 24 bytes of kernel .text for every 
down() and every up() simply going with mutexes.

Now if the mutex_lock and mutex_unlock were not inlined, the above 3 or 
4 instructions would become one or two per call site, which is still a 
gain in space, however not as important as the one provided by the move 
from semaphores to mutexes.  It however would be more costly in terms of 
cycles since a function prologue and epilogue is somewhat costly on ARM, 
especially with frame pointer enabled (I'll let RMK elaborate on his 
reasons for not disabling them).

And for mutex_lock_interruptible(), the inlined fastpath is not bigger 
than the non-inlined one, considering that the return value has to be 
tested (the test is done twice in the non-inlined case: once inside the 
function, and once outside of it) while the inlined version needs only 
one test.  They are therefore equivalent in terms of space.

Nicolas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [patch 2/3] mutex subsystem: fastpath inlining
  - From: Ingo Molnar <mingo@elte.hu>

References:
- [patch 00/11] mutex subsystem, -V7
  - From: Ingo Molnar <mingo@elte.hu>
- [patch 2/3] mutex subsystem: fastpath inlining
  - From: Nicolas Pitre <nico@cam.org>
- Re: [patch 2/3] mutex subsystem: fastpath inlining
  - From: Ingo Molnar <mingo@elte.hu>

Prev by Date: Re: recommended mail clients
Next by Date: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface
Previous by thread: Re: [patch 2/3] mutex subsystem: fastpath inlining
Next by thread: Re: [patch 2/3] mutex subsystem: fastpath inlining
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]