JD wrote: > Correct James. The clobbering of the cache by 2 different threads > does not depend on whether or not the cpu is hyperthreaded. > Any two threads can achieve this clobering on any cpu, and it is > often the case. This last sentence is true, but with normal multitasking, and no multi-threading, each software thread gets a slice of the processor time to itself – usually several million clock cycles, these days¹. So the thread has a chance to fill the level 1 cache with its own data before another thread gets a look in. With multi-threading, each thread is *constantly* clobbering the other’s data. > The only situation where hyperthreading will show noticeable > improvement of execution speed is where the threads are all > children of the same process and are well behaved and work > almost entirely on the parent process' data space, with proper > synchronization. However, if the parent data space and text > space is larger than the cache, then the sibling threads can > still cause cache refill every time a sibling accesses a different > data space than other siblings. Ditto with the instruction cache. > Different threads have a different set of instructions. This does not appear to match reality for all processors. The Pentium 4 was both the first generally-available processor with multi-threading available, and a pretty poor example of multi-threading. So a lot of people got a poor first impression. Even there, there were other cases when multi-threading made a lot of sense: if, for example, the algorithm was such that you’re going to get mostly cache misses *anyway*, then you might as well have two threads hanging around waiting for data as one. Other processors (current Core i7 and i5, for example) tend not to have such a microscopic Level 1 cache, so there’s more chance for both working sets to fit in cache at the same time.² http://www.realworldtech.com/beta/forums/index.cfm?action=detail&id=89001&threadid=89001&roomid=2 (and following thread) gives a link to an Intel benchmark claiming a 50%+ performance improvement due to hyperthreading on Atom. Linus Torvalds³ effectively says “it’s easy to get 50% performance improvements if the CPU can’t make good use of all it’s resources with just one thread.” I’d note, too, that Bulldozer’s FPU is effectively multi-threading, and that doesn’t use Level 1 data cache *at all*: the data all comes from Level 2. AMD apparently believes they can get enough out-of-order re-ordering to hide the latency. > My basic attitude is forget hyperthreading. IMHO it is largely > a hype! You know, I’d actually agree with that on the desktop⁴ – but for different reasons. The number of hardware threads has mushroomed over the last ten years, but desktop software is still largely single-threaded. It’s still fairly rare for there to be a situation where desktop software can make efficient use of six or eight threads. The main exceptions are things like transcoding and compression – and few people buy desktops to do that – and compiling large software projects, like the Linux kernel. Personally, I prefer to let the Fedora Project do most of that for me! Hope this helps, James. ¹ IF the thread needs it. ² You don’t need the entire program in cache, just the bits that the program is currently using. ³ As far as we can tell, yes, *that* Linus. He certainly has the same use of language, the same arguing style, and knows stuff the real Linus would. ⁴ Servers often do have enough software threads to make use of all the hardware threads they can get – see Sun’s Niagara for an example. And single-core Atoms benefit from hyperthreading to improve latency. -- E-mail: james@ | “My aunt’s camel has fallen in the mirage.” aprilcottage.co.uk | -- “Soul Music”, Terry Pratchett. -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines