Re: [RFC] x86-64: Use SSE for copy_page and clear_page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Wed, 1 Jun 2005, Denis Vlasenko wrote:

> However, it is valid only if program writes in every byte in a cacheline.
> Then sufficiently smart CPU may avoid reading from main RAM.
> (I am not sure that today's CPUs are smart enough. K6s were not)

nobody does this yet on regular stores...

so-called "non-temporal" stores actually go through the write-combiners 
(which is why Andi is referring to them as write-combining stores)... the 
write-combiners have byte-enables so they can detect if a full line is 
dirty or not.

in the event a write-combiner is flushed before it's full, the behaviour 
i've measured on all k8/p-m/p4 is to do a read-modify-write *at the memory 
interface*.  this occurs at typically a much slower cycle rate than it 
would in the cache itself... in theory DDR supports a byte-enabled write 
to memory, and there should be no need to do a read-modify-write sequence. 
however all of these processors (and/or their northbridges as appropriate) 
save pins on their package -- they don't have any pins for the DDR byte 
enables (they're hardwired to enabled on the mobo).

(you can see this behaviour with any of the movnt or with maskmov ... just 
leave holes in the lines and watch the store cost go through the roof.)

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux