Re: [patch 4/7] Immediate Values - i386 Optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Denys Vlasenko ([email protected]) wrote:
> On Tuesday 18 September 2007 22:07, Mathieu Desnoyers wrote:
> > i386 optimization of the immediate values which uses a movl with code patching
> > to set/unset the value used to populate the register used as variable source.
> > 
> > Changelog:
> > - Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
> >   non atomic writes to a code region only touched by us (nobody can execute it
> >   since we are protected by the immediate_mutex).
> > - Put immediate_set and _immediate_set in the architecture independent header.
> 
> > +struct __immediate {
> > +	long var;		/* Pointer to the identifier variable of the
> > +				 * immediate value
> > +				 */
> > +	long immediate;		/*
> > +				 * Pointer to the memory location of the
> > +				 * immediate value within the instruction.
> > +				 */
> > +	long size;		/* Type size. */
> > +};
> 
> 
> > +		case 2:							\
> > +			asm (	".section __immediate, \"a\", @progbits;\n\t" \
> > +					".long %1, (0f)+2, 2;\n\t"	\
> > +					".previous;\n\t"		\
> > +					"1:\n\t"			\
> > +					".align 2;\n\t"			\
> > +					"0:\n\t"			\
> > +					"mov %2,%0;\n\t"		\
> > +				: "=r" (value)				\
> > +				: "m" (name##__immediate),		\
> > +				  "i" (0));				\
> 
> Instead of letting gcc use whatever instruction it sees fit best
> for accessing the variable (like add/cmp/test...)
> now we force it to use mov imm,reg first. Maybe with preceding nop
> due to "align 2".
> 

Yes, this is true. So, the following branch:

char x;

void testb(void)
{
        if (x > 5)
                testa();
}

Would turn into:
  56:   b0 00                   mov    $0x0,%al
  58:   3c 05                   cmp    $0x5,%al
  5a:   7e 05                   jle    61 <testb+0x11>


Rather than:

  56:   80 3d 00 00 00 00 05    cmpb   $0x5,0x0
  5d:   7e 05                   jle    64 <testb+0x14>

> And then we use 12 more bytes in __immediate section
> *for each* place where you read the variable.
> 

Yes. You must consider the this section is only used when updating the
variable. It is never used by the read-side and therefore does not
consume data cache on hot paths.

> Do you plan to use the same approach on x86_64?
> I mean, longs there are twice as long.
> 

Yup. It's a memory footprint vs active cacheline footprints tradeoff.
When GCC optimizes for size and we see kernel speedups, it is not so
because it "consumes" less memory, but rather that there is less junk
polluting the cachelines. So unless you worry about a few K of data and
are an embedded system developer, I really don't see why you worry about
this. Oh, and by the way, I provide the ability to disable immediate
values in the EMBEDDED menu.

> Can this be made conditional, on CONFIG_CC_OPTIMIZE_FOR_SIZE perhaps?

No. As I just stated, only embedded developers would have an interest in
disabling this features because they would have so few memory available
on their architecture. The memory consumed by the immediate values table
is out of the hot path cachelines and therefore does not impact overall
performance.

> --
> vda

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux