On Wed, 20 Jun 2007, Jarek Poplawski wrote:
>
> I don't agree with this (+ I know it doesn't matter).
>
> The real bug is what Chuck Ebbert wrote: "Spinlocks aren't fair".
> And here they are simply lawlessly not fair.
Well, that's certainly a valid standpoint. I wouldn't claim you're
_wrong_.
At least in theory.
In *practice*, fair spinlocks are simply not possible. Not as in "it's
hard to do", but as in "you simply cannot do it".
The thing is, most spinlocks get held for a rather short time (if that
wasn't the case, you'd use something like a mutex), and it's acually a
short time not on a "human scale", but on a "CPU core scale".
IOW, the cost of doing the cacheline bouncing is often equal to, or bigger
than, the actual cost of the operation that happens inside the spinlock!
What does this result in? It automatically means that software simply
*cannot* do fair spinlocks, because all the timing costs are in parts that
software doesn't even have any visibility into, or control over!
Yeah, you could add artificial delays, by doing things like cycle counting
around the spinlock operation to see if you got the lock when it was
_contended_, or whether you got a lock that wasn't, and then adding some
statistics etc, but at that point, you don't have a spinlock any more, you
have something else. I don't know what to call it.
And you could add flags like "this spinlock is getting a lot of
contention", try some other algorithm etc. But it's all complicated and
against the whole *point* of a spinlock, which is to get in and out as
fast as humanly possible.
So in practice, spinlock fairness is inherently tied to the hardware
behaviour of the cache coherency algorithm.
Which gets us to the next level: we can consider hardware that isn't
"fair" in its cache coherency to be buggy hardware.
That's also a perfectly fine standpoint, and it's actually one that I have
a lot of sympathy for. I think it's much easier to some degree to do
fairness in the cache coherency than at any other level, because it's
something where the hardware really *could* do things like counting
bouncing etc.
However, while it's a perfectly fine standpoint, it's also totally
unrealistic. First off, the hardware doesn't even know whether the
spinlock "failed" or not on any architecture platform I'm aware of. On
x86, the spinlock operation under Linux is actually just an atomic
decrement, and it so happens that the rules for failure was that it didn't
go negative. But that's just a Linux internal rule - the hardware doesn't
know.
So you'd have to actualyl have some specific sequence with specific
fairness rules (maybe you could make the rule be that a failed atomic
"cmpxchg" counts as a real failure, and if you give higher priority to
cores with lots of failures etc).
IOW, it's certainly possible *in*theory* to try to make hardware that has
fairness guarantees. However, any hardware designer will tell you that
(a) they have enough problems as it is
(b) it's simply not worth their time, since there are other (simpler)
things that are much much more important.
So the end result:
- hardware won't do it, and they'd be crazy to really even try (apart
from maybe some really simplistic stuff that doesn't _guarantee_
anything, but maybe helps fairness a bit)
- software cannot do it, without turning spinlocks into something really
slow and complicated, at which point they've lost all meaning, and you
should just use a higher-level construct like a mutex instead.
In other words, spinlocks are optimized for *lack* of contention. If a
spinlock has contention, you don't try to make the spinlock "fair". No,
you try to fix the contention instead!
That way, you fix many things. You probably speed things up, _and_ you
make it fairer. It might sometimes take some brains and effort, but it's
worth it.
The patch I sent out was an example of that. You *can* fix contention
problems. Does it take clever approaches? Yes. It's why we have hashed
spinlocks, RCU, and code sequences that are entirely lockless and use
optimistic approaches. And suddenly you get fairness *and* performance!
It's a win-win situation. It does require a bit of effort, but hey, we're
good at effort.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]