Re: [BUG mm] "fixed" i386 memcpy inlining buggy

On Wednesday 06 April 2005 16:18, Richard B. Johnson wrote:
> 
> Attached is inline ix86 memcpy() plus test code that tests its
> corner-cases. The in-line code makes no jumps, but uses longword
> copies, word copies and any spare byte copy. It works at all
> offsets, doesn't require alignment but would work fastest if
> both source and destination were longword aligned.

Yours is:

        "shr $1, %%ecx\n"       \
        "pushf\n"               \
        "shr $1, %%ecx\n"       \
        "pushf\n"               \   <=== not needed
        "rep\n"                 \
        "movsl\n"               \
        "popf\n"                \   <=== not needed
        "adcl %%ecx, %%ecx\n"   \
        "rep\n"                 \
        "movsw\n"               \
        "popf\n"                \
        "adcl %%ecx, %%ecx\n"   \
        "rep\n"                 \
        "movsb\n"               \

You struggle too much for that movsw.

-mm one (which happen to be mine) is:

	"movl %ecx,%4"
	"shr $2,%ecx"
        "rep ; movsl"
        "movl %4,%%ecx"
        "andl $3,%%ecx"
        "jz 1ft"     /* pay 2 byte penalty for a chance to skip microcoded rep */
        "rep ; movsb"
"1:"

and I can still drop that jz. It is there just to have
a chance to skip rep movsb, it was measured to be slow
enough to matter. rep movs are a bit slow to start, on small
blocks it is measurable.

However, maybe it is even better without jz,
need to benchmark 'cold path' (i.e. where branch predictor
have no data to predict it) somehow.
--
vda

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- RE: [BUG mm] "fixed" i386 memcpy inlining buggy
  - From: "Dave Korn" <[email protected]>
- RE: [BUG mm] "fixed" i386 memcpy inlining buggy
  - From: "Richard B. Johnson" <[email protected]>

Prev by Date: Re: 2.6.12-rc2-mm1: ACPI=y, ACPI_BOOT=n problems
Next by Date: Re: Linux 2.4.30-rc3 md/ext3 problems (ext3 gurus : please check)
Previous by thread: RE: [BUG mm] "fixed" i386 memcpy inlining buggy
Next by thread: Re: [BUG mm] "fixed" i386 memcpy inlining buggy
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]