On 09/25/2010 12:35 PM, James Wilkinson wrote: > Michael Miles wrote: >> Thank's for the clear up. My question is with Hyperthreading that is if >> each core does double duty so to speak by looking after two threads >> would it not do basically the same work as one core full bore on one thread. >> Is there a speed difference (faster, slower) > Good question. The answer is “it depends, but it’s usually faster”. > > Reasons why it can be faster: > * Most modern processors can despatch up to three or four instructions > at a time (IF the front end can identify enough instructions that > logically can be run at the same time), but will have six to ten > execution units to actually run the instructions¹. Therefore, one > thread might be able to make use of execution units the other thread > isn’t using. > > * Compared to CPU speed, it takes a seriously long time to get data > from main memory. If one thread is waiting for data to arrive, the > other one can make full use of the processor. > > * Most modern CPUs do out-of-order execution, which means they can > often find things to do while waiting for data to come from (L2/L3) > cache. That’s not guaranteed, though, so the other thread might get > more resources to play with. > > On the other hand, Atom isn’t out-of-order, and can’t do anything > while it’s waiting for data from Level 2 cache. So the other thread > has full run of the core. > > Why it can be slower: > * The cache memory is having to look after two sets of data, not just > one, which means there’ll be a lot more cache misses. The worst case > example would be something like two threads, each of which are > regularly hitting a different 6K of data, on a Pentium 4 with only 8K > Level 1 data cache. Each thread will be constantly replacing the > other’s data, meaning each thread is continually having to wait for > data from Level 2 cache. > > This effect was especially noticeable on Pentium 4-based CPUs: a lot of > high-end benchmarks would be run with SMT turned off. > > Hope this helps, > > James. > > ¹ The instruction units are specialised: if a thread is 100% integer, > the FPU units won’t be of any use to it. > Correct James. The clobbering of the cache by 2 different threads does not depend on whether or not the cpu is hyperthreaded. Any two threads can achieve this clobering on any cpu, and it is often the case. The only situation where hyperthreading will show noticeable improvement of execution speed is where the threads are all children of the same process and are well behaved and work almost entirely on the parent process' data space, with proper synchronization. However, if the parent data space and text space is larger than the cache, then the sibling threads can still cause cache refill every time a sibling accesses a different data space than other siblings. Ditto with the instruction cache. Different threads have a different set of instructions. My basic attitude is forget hyperthreading. IMHO it is largely a hype! -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines