Re: [patch] uninline init_waitqueue_*() functions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Andrew Morton <[email protected]> wrote:

> Ingo Molnar <[email protected]> wrote:
> >
> > > > i also tried a config with the best size settings (disabling 
> > > > FRAME_POINTER, enabling CC_OPTIMIZE_FOR_SIZE), and this gives:
> > > > 
> > > >   text            data    bss     dec         filename
> > > >   20777768        6076042 3081864 29935674    vmlinux.x32.size.before
> > > >   20748140        6076178 3081864 29906182    vmlinux.x32.size.after
> > > > 
> > > > or a 34.8 bytes win per callsite (29K total).
> > > > 
> > > 
> > > With gcc-4.1.0 on i686, uninlining those three functions as per the 
> > > below patch _increases_ the allnoconfig vmlinux's .text from 833456 
> > > bytes to 833728.
> > 
> > that's just the effect of CONFIG_REGPARM and CONFIG_CC_OPTIMIZE_FOR_SIZE 
> > not being set in an allnoconfig. Once i set them the text size evens 
> > out:
> > 
> >  431348   60666   27276  519290   7ec7a vmlinux.x32.mini.before
> >  431359   60666   27276  519301   7ec85 vmlinux.x32.mini.after
> > 
> > compiling without CONFIG_REGPARM on i686 (if you care about text size) 
> > makes little sense. It penalizes function calls artificially.
> 
> OK, but what happened to the 35-bytes-per-callsite saving?

there are 3 effects i can see:

firstly, allnoconfig implies SMP off, so the spinlock init in 
init_waitqueue_head() is not done and it becomes a 2-instruction thing.

secondly, the savings depend on the function size. Uninlining from a 
small function (that makes use of the inlined function) can be a loss if 
the function call causes more register clobbering. For large functions 
we clobber all registers anyway so there's no extra stack saving, etc. 

the allnoconfig kernel makes use of these waitqueue ops in smaller 
kernel-core functions as well, where the uninlining is a loss. 
(sleep_on(), etc.) Furthermore, UP kernel tends to decrease function 
sizes too.

larger kernel configs include more non-core subsystems/drivers as well, 
which tend to have larger function sizes. There the uninlining win is 
larger.

there's a third, smaller effect too: gcc manages to do tail-merging of 
the init_waitqueue_head calls, in a handful of cases, further reducing 
the cost. I found no such tail-merges done for the 53 callsites in the 
allnoconfig kernel.

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux