Re: Why Semaphore Hardware-Dependent?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



David Howells wrote:
Nick Piggin <[email protected]> wrote:


+	while ((tmp = atomic_read(&sem->count)) >= 0) {
+		if (tmp == atomic_cmpxchg(&sem->count, tmp,
+				   tmp + RWSEM_ACTIVE_READ_BIAS)) {


NAK for FRV.  Do not use atomic_cmpxchg() there as it isn't strictly atomic
(FRV only has one strictly atomic operation: SWAP).  Please leave FRV as using
the spinlock version which is more efficient on UP.

From what I can read of the asm and the documentation, it is atomic. If
it were not atomic then it is badly broken.

BTW. if atomic_cmpxchg is slower than a local_irq_disable+local_irq_enable
on your architecture then you have probably not implemented cmpxchg well
because you may as well just implement it with local_irq_disable.

Please also show benchmarks to show the performance difference between your
version and the old version before Ingo messed it up and made everything
unconditionally out of line without cleaning the inline asm up.

I'm not so interested in counting cycles so much as consolidating duplicated
code and reduce complexity and icache footprint. I'll leave you to benchmark
Ingo's changes if you're concerned about them.

Of course I will benchmark the end results when I finish the patch, though.

If you are going to generalise, you should get rid of everything barring the
spinlock-based version and stick with that.  It will cost you performance
under some circumstances, but it's better under others than attempting to use
atomic_cmpxchg() which may not really exist on all archs.

atomic_cmpxchg exists on all architectures.

I'm happy to go with the spinlock version (it is even simpler), but I will
have to benchmark that. I have seen small slowdowns there in high contention
situations but it was improved with my rwsem scalability patch. If
performance is no different between the two, then the spinlock version is a
no brainer.

You've also caused another problem: the spinlock based version permits up to
2^31 - 1 readers at one time, the inline optimised version, on a 32-bit arch,
will only permit up to 2^16 - 1 at most.  By doing this to x86_64, you've
reduced the number of processes it can support.

Yep, Christoph pointed this out. I'll fix it.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com -
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux