Re: [patch 2/3] mutex subsystem: fastpath inlining

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 27 Dec 2005, Ingo Molnar wrote:

> 
> * Nicolas Pitre <[email protected]> wrote:
> 
> > Some architectures, notably ARM for instance, might benefit from 
> > inlining the mutex fast paths. [...]
> 
> what is the effect on text size? Could you post the before- and 
> after-patch vmlinux 'size kernel/test.o' output in the nondebug case, 
> with Arjan's latest 'convert a couple of semaphore users to mutexes' 
> patch applied? [make sure you've got enough of those users compiled in, 
> so that the inlining cost is truly measured. Perhaps also do 
> before/after 'size' output of a few affected .o files, without mixing 
> kernel/mutex.o into it, like vmlinux does.]

Theory should be convincing enough.  First of all, all the semaphore 
fast paths are always inlined currently, on all architectures I've 
looked at.  A down() fast path is always looking like this:

        mrs     ip, cpsr
        orr     lr, ip, #128
        msr     cpsr_c, lr
        ldr     lr, [r0]
        subs    lr, lr, #1
        str     lr, [r0]
        msr     cpsr_c, ip
        movmi   ip, r0
        blmi    __down_failed

So our starting point for comparison is 9 instructions for every down() 
occurence in the kernel.  Same thing for up().  Every instruction is 
invariably 4 bytes.

Now let's look at the typical mutex_lock():

        mov     r4, #0
        swp     r3, r4, [r0]
        cmp     r3, #1
        blne      __mutex_lock_noinline

This is 4 instructions.  Further more, the first "mov r4, #0" can often 
be eliminated when gcc can cse the constant 0 from another 
register.  We're talking about 3 instructions then, down from 9 !

We therefore saves between 20 and 24 bytes of kernel .text for every 
down() and every up() simply going with mutexes.

Now if the mutex_lock and mutex_unlock were not inlined, the above 3 or 
4 instructions would become one or two per call site, which is still a 
gain in space, however not as important as the one provided by the move 
from semaphores to mutexes.  It however would be more costly in terms of 
cycles since a function prologue and epilogue is somewhat costly on ARM, 
especially with frame pointer enabled (I'll let RMK elaborate on his 
reasons for not disabling them).

And for mutex_lock_interruptible(), the inlined fastpath is not bigger 
than the non-inlined one, considering that the return value has to be 
tested (the test is done twice in the non-inlined case: once inside the 
function, and once outside of it) while the inlined version needs only 
one test.  They are therefore equivalent in terms of space.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux