Re: [rfc][patch 3/3] x86: optimise barriers

On Fri, 12 Oct 2007, Jarek Poplawski wrote:
> 
> First it looks like a really great thing that it's revealed at last.
> But then... there is probably some confusion: did we have to use
> ineffective code for so long?

I think the chip manufacturers really wanted to keep their options open.

Having the option to re-order loads in architecturally visible ways was 
something that they probably felt they really wanted to have. On the other 
hand:

 - I bet they had noticed that things break, and some applications depend 
   on fairly strong ordering (not necessarily in Linux-land, but..)

   I suspect hw manufacturers go through life hoping that "software 
   improves". They probably thought that getting rid of the old 16-bit 
   windows would mean that less people depended on undefined behaviour. 

   And I suspect that they started noticing that no, with threads and 
   JVM's and things, *more* people started depending on fairly strong 
   memory ordering.

 - I suspect Intel in particular noticed that they can do a lot of very 
   aggressive re-ordering at a microarchitectural level, but can still 
   guarantee that *architecturally* they never show it (dynamic detection 
   of reordered loads being replayed on cache dirty events etc).

IOW, I suspect that both Intel and AMD noticed that while they had wanted 
to keep their options open, those options weren't really realistic, and 
not something that the market wanted (aggressive use of threading wants 
*stricter* memory ordering, not looser), and they could work well enough 
with a fairly strict memory model.

> So, maybe linux needs something like this, instead of waiting few
> years with each new model for vendors goodwill? IMHO, even for less
> popular processors, this could be checked under some debugging option
> at the system start (after disabling suspicios barrier for a while
> plus some WARN_ONs).

Quite frankly, even *within* Intel and AMD, there are damn few people who 
understand exactly what the memory ordering requirements and guarantees 
are and historically were for the different CPU's.

I would bet that had you asked a random (but still competent) Intel/AMD 
engineer that wasn't really intimately involved with the actual design of 
the cache protocols and memory pipelines, they would absolutely not have 
been able to tell you how the CPU actually worked.

So no, there's no way a software person could have afforded to say "it 
seems to work on my setup even without the barrier". On a dual-socket 
setup with s shared bus, that says absolutely *nothing* about the 
behaviour of the exact same CPU when used with a multi-bus chipset. Not to 
mention another revisions of the same CPU - much less a whole other 
microarchitecture.

So yes, I've personally been aware for about a year that the memory 
ordering was going to likely be documented, but no way was I going to 
depend on it until Intel and AMD were ready to state so *publicly*. 
Because before that happens, they may have noticed errata etc that made it 
not work out.

Also, please note that we didn't even just change the barriers immediately 
when the docs came out. I want to do it soon - still *early* in the 2.6.24 
development cycle - exactly because bugs happen, and if somebody notices 
something strange, we'll have more time to perhaps decide that "oops, 
there's something bad going on, let's undo this for the real 2.6.24 
release until we can figure out the exact pattern".

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [rfc][patch 3/3] x86: optimise barriers
  - From: Jarek Poplawski <[email protected]>

References:
- Re: [rfc][patch 3/3] x86: optimise barriers
  - From: Jarek Poplawski <[email protected]>

Prev by Date: RE: [PATCH try #3] Blackfin BF54x Input Keypad controller driver
Next by Date: Re: [PATCH] task containersv11 add tasks file interface fix for cpusets
Previous by thread: Re: [rfc][patch 3/3] x86: optimise barriers
Next by thread: Re: [rfc][patch 3/3] x86: optimise barriers
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]