On Wed, 27 Jun 2007, Davide Libenzi wrote:
> >
> > Now, I have good reason to believe that all Intel and AMD CPU's have a
> > stricter-than-documented memory ordering, and that your spinlock may
> > actually work perfectly well. But it still worries me. As far as I can
> > tell, there's a theoretical problem with your spinlock implementation.
>
> Nice catch ;) But wasn't Intel suggesting in not relying on the old
> "strict" ordering rules?
Actually, both Intel and AMD engineers have been talking about making the
documentation _stricter_, rather than looser. They apparently already are
pretty damn strict, because not being stricter than the docs imply just
ends up exposing too many potential problems in software that didn't
really follow the rules.
For example, it's quite possible to do loads out of order, but guarantee
that the result is 100% equivalent with a totally in-order machine. One
way you do that is to keep track of the cacheline for any speculative
loads, and if it gets invalidated before the speculative instruction has
completed, you just throw the speculation away.
End result: you can do any amount of speculation you damn well please at a
micro-architectural level, but if the speculation would ever have been
architecturally _visible_, it never happens!
(Yeah, that is just me in my non-professional capacity of hw engineer
wanna-be: I'm not saying that that is necessarily what Intel or AMD
actually ever do, and they may have other approaches entirely).
> IOW shouldn't an mfence always be there? Not only loads could leak up
> into the wait phase, but stores too, if they have no dependency with the
> "head" and "tail" loads.
Stores never "leak up". They only ever leak down (ie past subsequent loads
or stores), so you don't need to worry about them. That's actually already
documented (although not in those terms), and if it wasn't true, then we
couldn't do the spin unlock with just a regular store anyway.
(There's basically never any reason to "speculate" stores before other mem
ops. It's hard, and pointless. Stores you want to just buffer and move as
_late_ as possible, loads you want to speculate and move as _early_ as
possible. Anything else doesn't make sense).
So I'm fairly sure that the only thing you really need to worry about in
this thing is the load-load ordering (the load for the spinlock compare vs
any loads "inside" the spinlock), and I'm reasonably certain that no
existing x86 (and likely no future x86) will make load-load reordering
effects architecturally visible, even if the implementation may do so
*internally* when it's not possible to see it in the end result.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]