Arjan van de Ven <[email protected]> wrote:
> > Can you arrange .text.lock.mutex+0 here to just jump directly to
> > __mutex_lock_noinline? Otherwise we have an unnecessarily extended return
> > path.
>
> jmp is free on x86. eg zero cycles. Any trickery is more likely to cost
> because of doing unexpected things.
I'm talking about replacing:
<caller>: call
mutex_lock: lock decl
mutex_lock: js
.text.lock.mutex: call
__mutex_lock: ret
.text.lock.mutex: jmp
mutex_lock: ret
<caller>: ...
With:
<caller>: call
mutex_lock: lock decl
mutex_lock: js
.text.lock.mutex: jmp
__mutex_lock: ret
<caller>: ...
Or:
<caller>: call
mutex_lock: lock decl
mutex_lock: js
__mutex_lock: ret
<caller>: ...
This sort of thing is done by the compiler when it does tail-calling.
> > You may not want to make the JS go directly there, but you could have that
> > go to a JMP to __mutex_lock_noinline rather than having a CALL followed by
> > a JMP back to a return instruction.
>
> unbalanced call/ret pairs are REALLY expensive on x86. The current x86
> processors all do branch prediction on the ret based on a special
> internal call stack, breaking the symmetry is thus a branch prediction
> miss, eg 40+ cycles
In what way would this be unbalanced? You end up with one fewer CALL and one
fewer RET instruction. And why would the CPU think this is any different from
a function with a conditional branch in it? Eg:
<caller>: call
mutex_lock: lock decl
mutex_lock: js
mutex_lock: ret
<caller>: ...
The only route to __mutex_lock would be through mutex_lock...
Are there docs on this feature of the x86 anywhere?
David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]