Re: [RFC] x86-64: Use SSE for copy_page and clear_page

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Benjamin LaHaise schrieb:

>On Mon, May 30, 2005 at 10:05:28PM +0200, Michael Thonke wrote:
>  
>
>>No it doesn't like this sample here at all,I'll get segmentationfault on
>>that run.
>>    
>>
>
>Grab a new copy -- one of the routines had an unaligned store instead of 
>aligned for the register save.
>
>		-ben
>
>  
>
Hi Benjamin,

Here are the results with the new copy.

    *RUN 1: cc -o xmm64.o xmm64.c*

    ioGL64NX_EMT64 ~ # ./xmm64.o
    SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
    buffer = 0x2aaaaade7000
    clear_page() tests
    clear_page function 'warm up run'        took 13632 cycles per page
    clear_page function 'kernel clear'       took 6599 cycles per page
    clear_page function '2.4 non MMX'        took 6482 cycles per page
    clear_page function '2.4 MMX fallback'   took 6367 cycles per page
    clear_page function '2.4 MMX version'    took 6644 cycles per page
    clear_page function 'faster_clear_page'  took 6088 cycles per page
    clear_page function 'even_faster_clear'  took 5692 cycles per page
    clear_page function 'xmm_clear'  took 4270 cycles per page
    clear_page function 'xmma_clear'         took 6351 cycles per page
    clear_page function 'xmm2_clear'         took 4710 cycles per page
    clear_page function 'xmma2_clear'        took 6198 cycles per page
    clear_page function 'xmm3_clear'         took 6583 cycles per page
    clear_page function 'nt clear  '         took 4746 cycles per page
    clear_page function 'kernel clear'       took 6158 cycles per page

    copy_page() tests
    copy_page function 'warm up run'         took 9210 cycles per page
    copy_page function '2.4 non MMX'         took 6740 cycles per page
    copy_page function '2.4 MMX fallback'    took 6697 cycles per page
    copy_page function '2.4 MMX version'     took 9178 cycles per page
    copy_page function 'faster_copy'         took 11360 cycles per page
    copy_page function 'even_faster'         took 10133 cycles per page
    copy_page function 'xmm_copy_page_no'    took 8885 cycles per page
    copy_page function 'xmm_copy_page'       took 8725 cycles per page
    copy_page function 'xmma_copy_page'      took 9964 cycles per page
    copy_page function 'xmm3_copy_page'      took 7176 cycles per page
    copy_page function 'v26_copy_page'       took 6879 cycles per page
    copy_page function 'nt_copy_page'        took 10858 cycles per page


    *RUN 2: gcc -o xmm64.o xmm64.c*

    ioGL64NX_EMT64 ~ # ./xmm64.o
    SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
    buffer = 0x2aaaaade7000
    clear_page() tests
    clear_page function 'warm up run'        took 13981 cycles per page
    clear_page function 'kernel clear'       took 6708 cycles per page
    clear_page function '2.4 non MMX'        took 6505 cycles per page
    clear_page function '2.4 MMX fallback'   took 6235 cycles per page
    clear_page function '2.4 MMX version'    took 7251 cycles per page
    clear_page function 'faster_clear_page'  took 6390 cycles per page
    clear_page function 'even_faster_clear'  took 5932 cycles per page
    clear_page function 'xmm_clear'  took 4876 cycles per page
    clear_page function 'xmma_clear'         took 6379 cycles per page
    clear_page function 'xmm2_clear'         took 5264 cycles per page
    clear_page function 'xmma2_clear'        took 6373 cycles per page
    clear_page function 'xmm3_clear'         took 6651 cycles per page
    clear_page function 'nt clear  '         took 5186 cycles per page
    clear_page function 'kernel clear'       took 6326 cycles per page

    copy_page() tests
    copy_page function 'warm up run'         took 9537 cycles per page
    copy_page function '2.4 non MMX'         took 6776 cycles per page
    copy_page function '2.4 MMX fallback'    took 7407 cycles per page
    copy_page function '2.4 MMX version'     took 8812 cycles per page
    copy_page function 'faster_copy'         took 10992 cycles per page
    copy_page function 'even_faster'         took 10232 cycles per page
    copy_page function 'xmm_copy_page_no'    took 8918 cycles per page
    copy_page function 'xmm_copy_page'       took 9579 cycles per page
    copy_page function 'xmma_copy_page'      took 9854 cycles per page
    copy_page function 'xmm3_copy_page'      took 7602 cycles per page
    copy_page function 'v26_copy_page'       took 6811 cycles per page
    copy_page function 'nt_copy_page'        took 10958 cycles per page

    *RUN 3: gcc -pipe -march=nocona -O2 -o xmm64.o xmm64.c
    *
    SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
    buffer = 0x2aaaaade7000
    clear_page() tests
    clear_page function 'warm up run'        took 13626 cycles per page
    clear_page function 'kernel clear'       took 6780 cycles per page
    clear_page function '2.4 non MMX'        took 6755 cycles per page
    clear_page function '2.4 MMX fallback'   took 6283 cycles per page
    clear_page function '2.4 MMX version'    took 6764 cycles per page
    clear_page function 'faster_clear_page'  took 5764 cycles per page
    clear_page function 'even_faster_clear'  took 5240 cycles per page
    clear_page function 'xmm_clear'  took 4532 cycles per page
    clear_page function 'xmma_clear'         took 6352 cycles per page
    clear_page function 'xmm2_clear'         took 4983 cycles per page
    clear_page function 'xmma2_clear'        took 6211 cycles per page
    clear_page function 'xmm3_clear'         took 6748 cycles per page
    clear_page function 'nt clear  '         took 5166 cycles per page
    clear_page function 'kernel clear'       took 6201 cycles per page

    copy_page() tests
    copy_page function 'warm up run'         took 9651 cycles per page
    copy_page function '2.4 non MMX'         took 6724 cycles per page
    copy_page function '2.4 MMX fallback'    took 6905 cycles per page
    copy_page function '2.4 MMX version'     took 9722 cycles per page
    copy_page function 'faster_copy'         took 9738 cycles per page
    copy_page function 'even_faster'         took 9609 cycles per page
    copy_page function 'xmm_copy_page_no'    took 8846 cycles per page
    copy_page function 'xmm_copy_page'       took 8591 cycles per page
    copy_page function 'xmma_copy_page'      took 8250 cycles per page
    copy_page function 'xmm3_copy_page'      took 7879 cycles per page
    copy_page function 'v26_copy_page'       took 7512 cycles per page
    copy_page function 'nt_copy_page'        took 10424 cycles per page

    RUN 4: *gcc -pipe -march=nocona -O2 -fPIC -o xmm64.o xmm64.c*

    SSE test program $Id: fast.c,v 1.6 2000/09/23 09:05:45 arjan Exp $
    buffer = 0x2aaaaade7000
    clear_page() tests
    clear_page function 'warm up run'        took 13713 cycles per page
    clear_page function 'kernel clear'       took 6655 cycles per page
    clear_page function '2.4 non MMX'        took 6448 cycles per page
    clear_page function '2.4 MMX fallback'   took 6270 cycles per page
    clear_page function '2.4 MMX version'    took 7001 cycles per page
    clear_page function 'faster_clear_page'  took 5671 cycles per page
    clear_page function 'even_faster_clear'  took 5366 cycles per page
    clear_page function 'xmm_clear'  took 4737 cycles per page
    clear_page function 'xmma_clear'         took 6464 cycles per page
    clear_page function 'xmm2_clear'         took 5214 cycles per page
    clear_page function 'xmma2_clear'        took 6371 cycles per page
    clear_page function 'xmm3_clear'         took 6660 cycles per page
    clear_page function 'nt clear  '         took 5066 cycles per page
    clear_page function 'kernel clear'       took 6314 cycles per page

    copy_page() tests
    copy_page function 'warm up run'         took 9464 cycles per page
    copy_page function '2.4 non MMX'         took 7179 cycles per page
    copy_page function '2.4 MMX fallback'    took 6928 cycles per page
    copy_page function '2.4 MMX version'     took 9091 cycles per page
    copy_page function 'faster_copy'         took 9996 cycles per page
    copy_page function 'even_faster'         took 9824 cycles per page
    copy_page function 'xmm_copy_page_no'    took 8724 cycles per page
    copy_page function 'xmm_copy_page'       took 8920 cycles per page
    copy_page function 'xmma_copy_page'      took 8859 cycles per page
    copy_page function 'xmm3_copy_page'      took 7794 cycles per page
    copy_page function 'v26_copy_page'       took 7808 cycles per page
    copy_page function 'nt_copy_page'        took 9264 cycles per page

    Do you need more results or tests Benjamin?

    Greets and best regards
        Michael

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux