Re: [patch 1/3] mutex subsystem: trylock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nicolas Pitre wrote:

I provided you with the demonstration last week:


I didn't really find it convincing.

- the non SMP (ARM version < 6) is using xchg.

- xchg on ARM version < 6 is always faster and smaller than any preemption disable.


Maybe true, but as I said, if you have preemption enabled, then there
are far more preempt_xxx operations in other places than semaphores,
which impact both size and speed.

- xchg on ARM is the same size as the smallest solution you may think of
when preemption is disabled and never slower (prove me otherwise? if you wish).


I was going from your numbers where you said it was IIRC a cycle faster.

- all xchg based primitives are "generic" code already.


And yet there is a need for architecture specific code. Also having
xchg and a cmpxchg variants mean that you have two different sets of
possible interleavings of instructions, and xchg has much more subtle
cases as you demonstrated.

And I think you didn't look at the overall patch set before talking about "lots of ugliness", did you? The fact is that the code,

Yes I did and I think it is more ugly than my proposal would be.

especially the core code, is much cleaner now than it was when everything was seemingly "generic" since the current work on architecture abstractions still allows optimizations in a way that is so much cleaner and controlled than what happened with the semaphore code, and the debugging code can even take advantage of it without polluting the core code.

It happens that i386, x86_64 and ARM (if v6 or above) currently have their own tweaks to save space and/or cycles in a pretty strictly defined way. The effort is currently spent on making sure if other architectures want to use one of their own tricks for those they can do it without affecting the core code which remains 95% of the whole thing (which is not the case for semaphores), and the currently provided architecture specific versions are _never_ bigger nor slower than any generic version would be. Otherwise what would be the point?

I don't think size is a great argument, because the operations should
be out of line anyway to save icache (your numbers showed a pretty
large increase when they're inlined)

And as for speed, I'm not arguing that you can't save a couple of
cycles, but I'm weighing that against the maintainability of a generic
implementation which has definite advantages. Wheras I don't think you
could demonstrate any real speed improvement for saving a few cycles
from slightly faster semaphore operations on CONFIG_PREEMPT kernels?

If someone ever did find the need for an arch specific variant that
really offers advantages, then there is nothing to stop tha being
added.

--
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux