Re: [RFC] x86-64: Use SSE for copy_page and clear_page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> The SSE clear page fuction is almost twice as fast as the kernel's 
> current clear_page, while the copy_page implementation is roughly a 
> third faster.  This is likely due to the fact that SSE instructions 
> can keep the 256 bit wide L2 cache bus at a higher utilisation than 
> 64 bit movs are able to.  Comments?

Any use of write combining is wrong here because it forces
the destination out of cache, which causes performance issues later on. 
Believe me we went through this years ago.

If you can code up a better function for P4 that does not use
write combining I would be happy to add. I never tuned the functions
for P4. 

One simple experiment would be to just test if P4 likes the
simple rep ; movsq / rep ; stosq loops and enable them.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux