Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it

On Wed, Dec 06, 2006 at 11:16:55AM -0800, Christoph Lameter wrote:
> On Wed, 6 Dec 2006, Russell King wrote:
> 
> > On Wed, Dec 06, 2006 at 10:56:14AM -0800, Christoph Lameter wrote:
> > > I'd really appreciate a cmpxchg that is generically available for 
> > > all arches. It will allow lockless implementation for various performance 
> > > criticial portions of the kernel.
> > 
> > Let's recap on cmpxchg.
> > 
> > For CPUs with no atomic operation other than SWP, it is not lockless.
> 
> But then its also just requires disable/enable interrupts on UP which may 
> be cheaper than an atomic operation.

No.  SWP is atomic on the CPU it's being issued on, especially wrt
interrupts.  Only on one ARM CPU (which is UP) does it have a
questionable use, and there we do it via interrupt disable/enable.

> > For CPUs with load locked + store conditional, it is expensive.
> 
> Because it locks the bus? I am not that familiar with those architectures 
> but it seems that those will have a general problem anyways.

No.  That certainly would be bad for performance.  I can talk
authoritively from the ARM implementation.

When you use a special "ldrex" (load exclusive) instruction, the
CPU remembers the "address + cpu" pair.  If another access occurs
to the same address, this state is reset.

Only if this state is preserved will a "strex" (store exclusive)
instruction succeed.  This instruction returns status indicating
whether it succeeded.

So, to implement cmpxchg, you need to do this:

	; r1 = temporary register
	; r2 = address
	; r4 = new value
	; r3 = returned status
	ldrex	r1, [r2]
	cmp	r1, old_value
	streqex	r3, r4, [r2]

> > If you want an operation for performance critical portions of the
> > kernel, please allow architecture maintainers the freedom to use their
> > best performance enhancements.
> 
> And thereby denying the kernel developers to use a simple atomic SMP 
> operation? Adding additional defines for each arch and each performance 
> critical piece of kernel logic?

No.  If you read what I said, you'll see that you can _cheaply_ use
cmpxchg in a ll/sc based implementation.  Take an atomic increment
operation.

	do {
		old = load_locked(addr);
	} while (store_exclusive(old, old + 1, addr);

On a cmpxchg, that "store_exclusive" (loosely) becomes your cmpxchg
instruction, comparing the first arg, and if equal storing the second.
The "load_locked" macro becomes a standard pointer deref.  Ergo, x86
becomes:

	do {
		load value
		manipulate it
		conditional store
	} while not stored

On ll/sc, the load_locked() macro is the load locked instruction.  The
store_exclusive() macro is the exclusive store and it doesn't need to
use the first parameter at all.  Ergo, ARM becomes:

	do {
		ldrex r1, [r2]
		manipulate r1
		strex r0, r1, [r2]
	} while failed

Notice that both are optimal.

Now let's consider the cmpxchg case.

	do {
		val = *addr;
	} while (cmpxchg(val, val + 1, addr);

The x86 case is _identical_ to the ll/sc based implementation.  Absolutely
entirely.  No impact what so ever.

Let's look at the ll/sc case.  The cmpxchg code implemented on this has
to reload the original value, compare it, if equal store the new value.
So:

	do {
		val = *addr;
		(r2 = addr, 
		ldrex r1, [r2]
		compare r1, r0
		strexeq r4, r3, [r2] (store exclusive if equal)
	} while store failed or comparecondition failed

Note how the cmpxchg has _forced_ the ll/sc implementation to become
more complex.

So, let's recap.

Implementing ll/sc based accessor macros allows both ll/sc _and_ cmpxchg
architectures to produce optimal code.

Implementing an cmpxchg based accessor macro allows cmpxchg architectures
to produce optimal code and ll/sc non-optimal code.

See my point?

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
  - From: Nick Piggin <[email protected]>
- Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
  - From: Matthew Wilcox <[email protected]>

References:
- [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
  - From: David Howells <[email protected]>
- Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
  - From: Christoph Lameter <[email protected]>
- Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
  - From: Russell King <[email protected]>
- Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
  - From: Christoph Lameter <[email protected]>

Prev by Date: Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Next by Date: Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Previous by thread: Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Next by thread: Re: [PATCH] WorkStruct: Implement generic UP cmpxchg() where an arch doesn't support it
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]