On Thu, 22 Dec 2005, Ingo Molnar wrote:
>
> * Nicolas Pitre <[email protected]> wrote:
>
> > This patch moves the core mutex code over to the atomic helpers from
> > previous patch. There is no change for i386 and x86_64, except for
> > the forced unlock state that is now done outside the spinlock (doing
> > so doesn't matter since another CPU could have locked the mutex right
> > away even if it was unlocked inside the spinlock). This however
> > brings great improvements on ARM for example.
>
> i'm wondering how much difference it makes on ARM - could you show us
> the before and after disassembly of the fastpath, to see the
> improvement?
The only way to do an atomic decrement on 99% of all ARM processors in
the field requires disabling interrupts. So for instance:
void __mutex_lock(struct mutex *lock)
{
if (atomic_dec_return(&lock->count) < 0)
__mutex_lock_failed(lock);
}
Would produce:
__mutex_lock_atomic:
mrs r1, cpsr @ local_irq_save
orr r3, r1, #128
msr cpsr_c, r3
ldr r3, [r0, #0]
sub r3, r3, #1
str r3, [r0, #0]
msr cpsr_c, r1 @ local_irq_restore
cmp r3, #0
movge pc, lr
b __mutex_lock_failed
I can measure 23 cycles on an XScale processor for the first 8
instructions which corresponds to the "if (atomic_dec_return(v) < 0)".
It was suggested that a preempt_disable()/preempt_enable() would be
sufficient and probably faster than the IRQ disable... which turned not
to be true for non-XScale ARM variants. It would take 14 instructions
on all variants, and on XScale it needs 20 cycles.
Now with my patch applied, it looks like this:
void __mutex_lock(struct mutex *lock)
{
if (atomic_xchg(&lock->count, 0) != 1)
__mutex_lock_failed(lock);
}
with the following assembly:
__mutex_lock_atomic:
mov r3, #0
swp r2, r3, [r0]
cmp r2, #1
moveq pc, lr
b __mutex_lock_failed
The equivalent of the first 8 instructions in the first example is now
down to 3. And when gcc can cse the constant 0 (which is not
possible with the first example) then it would be only 2 instructions,
which is really nice to inline. And the above takes only 8 cycles on an
XScale instead of 20-23 cycles.
> your patches look OK to me, only one small detail sticks out: i'd
> suggest to rename the atomic_*_contended macros to be arch_mutex_*_...,
> i dont think any other code can make use of it.
OK.
> Also, it would be nice
> to see the actual ARM patches as well, which make use of the new
> infrastructure.
Well, with the generic functions based on atomic_xchg() the generated
code is pretty good actually. I don't think I could pack it more with a
special handler. Well for ARM version 6 I have a kunning idea though...
maybe for tomorrow.
> could you resend them against my latest queue that i just posted? I'll
> look at integrating them tomorrow.
Yes, please find them in following emails.
Nicolas
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]