On Tue, 2006-01-03 at 15:07 +0000, David Howells wrote:
> Ingo Molnar <[email protected]> wrote:
>
> > this is version -V11 of the generic mutex subsystem, against v2.6.15.
>
> When compiling for x86 with no mutex debugging, I see:
>
> (gdb) disas mutex_lock
> Dump of assembler code for function mutex_lock:
> 0xc02950d0 <mutex_lock+0>: lock decl (%eax)
> 0xc02950d3 <mutex_lock+3>: js 0xc02950ef <.text.lock.mutex>
> 0xc02950d5 <mutex_lock+5>: ret
> End of assembler dump.
> (gdb) disas 0xc02950ef
> Dump of assembler code for function .text.lock.mutex:
> 0xc02950ef <.text.lock.mutex+0>: call 0xc0294ffb <__mutex_lock_noinline>
> 0xc02950f4 <.text.lock.mutex+5>: jmp 0xc02950d5 <mutex_lock+5>
> 0xc02950f6 <.text.lock.mutex+7>: call 0xc029509f <__mutex_unlock_noinline>
> 0xc02950fb <.text.lock.mutex+12>: jmp 0xc02950db <mutex_unlock+5>
> End of assembler dump.
>
> Can you arrange .text.lock.mutex+0 here to just jump directly to
> __mutex_lock_noinline? Otherwise we have an unnecessarily extended return
> path.
jmp is free on x86. eg zero cycles. Any trickery is more likely to cost
because of doing unexpected things.
>
> You may not want to make the JS go directly there, but you could have that go
> to a JMP to __mutex_lock_noinline rather than having a CALL followed by a JMP
> back to a return instruction.
unbalanced call/ret pairs are REALLY expensive on x86. The current x86
processors all do branch prediction on the ret based on a special
internal call stack, breaking the symmetry is thus a branch prediction
miss, eg 40+ cycles
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]