Re: Performance analysis of Linux Kernel Markers 0.20 for 2.6.17

On Sun, 2006-10-01 at 20:07 -0400, Mathieu Desnoyers wrote:
> * Nicholas Miell ([email protected]) wrote:
> > To summarize in chart form:
> > 
> >               JoC	JoCo	2NOP	1NOP
> > empty loop	1.17	2.50	0.50	2.50
> > memcpy	2.12	0.07	0.03	0.43
> > 
> > JoC 	= Jump over call - generic
> > JoCo	= Jump over call - optimized
> > 2NOP	= "data16 data16 nop; data16 nop"
> > 1NOP	= NOP with ModRM
> > 
> > I left out your "nop; lea 0(%esi), %esi" because it isn't actually a NOP
> > (the CPU will do actual work even if it has no effect, and on AMD64,
> > that insn is "nop; lea 0(%rdi), %esi", which will truncate RDI+0 to fit
> > 32-bits.)
> > 
> > The performance of NOP with ModRM doesn't suprise me -- AFAIK, only the
> > most recent of Intel CPUs actually special case that to be a true
> > no-work-done NOP.
> > 
> > It'd be nice to see the results of "jump to an out-of-line call with the
> > jump replaced by a NOP", but even if it performs well (and it should,
> > the argument passing and stack alignment overhead won't be executed in
> > the disabled probe case), actually using it in practice would be
> > difficult without compiler support (call instructions are easy to find
> > thanks to their relocations, which local jumps don't have).
> > 
> 
> Hi,
> 
> Just to make sure we see things the same way : the JoC approach is similar to
> the out-of-line call in that the argument passing and stack alignment are not
> executed when the probe is disabled.
> 

Yeah, I assumed that.

For the jump-over-call, you'll always have to do a test and a
conditional jump (even when the probe is disabled), and that test takes
work and that conditional jump will consume "useless" space in the
predictor cache.

For an unconditional-call-replaced-by-NOP, you'll always be doing the
work involved in the setup and cleanup for a function call, but there's
no conditional branching (which is a win, as your test results
demonstrate).

For the ideal case, you'd have a single unconditional jump to an
out-of-line function call, which you'd replace with a single NOP. No
unnecessary work (beyond the NOP instruction itself) gets done in the
disabled probe case, and in the enabled case, you don't have to do any
tests to see if the probe should be run. It should be an improvement all
around, if we could just get gcc to do the hard part of replacing the
unconditional jump with a NOP for us.

-- 
Nicholas Miell <[email protected]>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- Re: Performance analysis of Linux Kernel Markers 0.20 for 2.6.17
  - From: Mathieu Desnoyers <[email protected]>
- Re: Performance analysis of Linux Kernel Markers 0.20 for 2.6.17
  - From: Nicholas Miell <[email protected]>
- Re: Performance analysis of Linux Kernel Markers 0.20 for 2.6.17
  - From: Mathieu Desnoyers <[email protected]>

Prev by Date: Re: [PATCH 2.6.18-mm2] acpi: add backlight support to the sony_acpi driver
Next by Date: Re: [PATCH] drivers/char/ip2: kill unused code, label
Previous by thread: Re: Performance analysis of Linux Kernel Markers 0.20 for 2.6.17
Next by thread: Re: Performance analysis of Linux Kernel Markers 0.20 for 2.6.17
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]