Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Tue, 3 Jan 2006, Martin J. Bligh wrote:
>
> Well, it's not just one function, is it? It'd seem that if you unlined
> a whole bunch of stuff (according to this theory) then normal
> macro-benchmarks would go faster? Otherwise it's all just rather
> theoretical, is it not?

One of the problems with code size optimizations is that they 
fundamentally show much less impact under pretty much any traditional 
benchmark.

In user space, for example, the biggest impact of size optimization tends 
to be in things like load time and low-memory situations. Yet, that's 
never what is benchmarked (yeah, people will benchmark some low-memory 
kernel situations by compating different swap-out algorithms etc against 
each other - but they'll almost never compare the perceived speed of your 
desktop when you start swapping).

Now, I'll happily also admit that code and data _layout_ is often a lot 
more effective than just code size optimizations. That's especially true 
with low-memory situations where the page size being larger than many of 
the code sequences, you can make a bigger impact by changing utilization 
than by changing absolute size.

But even then it's actually really hard to measure. Cache effects tend to 
be hard _anyway_ to measure, it's really hard when the interesting case is 
the cold-cache case and can't even do simple microbenchmarks that repeat a 
million times to get stable results.

So the best we can usually do is "microbenchmarks don't show any 
noticeable _worse_ behaviour, and code size went down by 10%".

Just as an example: there's an old paper on the impact of the OS design on 
the memory subsystem, and one of the tables is about how many cycles per 
instruction is spent on cache effects (this was Ultrix vs Mach, the load 
was the average of a workload that was either specint or looked a lot 
like it).

			I$	D$

	Ultrix user:	0.07	0.08
	Mach user:	0.07	0.08
	Ultrix system:	0.43	0.23
	Mach system:	0.57	0.29

Now, that's an oldish paper, and we really don't care about Ultrix vs Mach 
nor about the particular hw (Decstation), but the same kind of basic 
numbers have been repeated over an over. Namely that system code tends to 
have very different behaviour from user code wrt caches. Caches may have 
gotten better, but code has gotten bigger, and cores have gotten faster.

And in particular, they tend to be _much_ worse. Something you'll seldom 
see as clearly in micro-benchmarks, if only because under those benchmarks 
things will generally be more cached - especially on the I$ side.

So I should probably try to find something slightly more modern, but what 
it boils down to is that at least one well-cited paper that is fairly easy 
to find claims that about _half_a_cycle_ for each instruction was spent on 
I$ misses in system code on a perfectly regular mix of programs. And that 
the cost of I$ was actually higher than the cost of D$ misses.

Now, most of the _time_ tends to be spent in user mode. That is probably 
part of the reason for why system mode gets higher cache misses (another 
is that obviously you'd hope that the user program can optimize for its 
particular memory usage, while the kernel can't). But the result remains: 
I$ miss time is a noticeable portion of system time.

Now, I claim that I$ miss overhead on a system would probably tend to have 
a pretty linear relationship with the size of the static code. There 
aren't that many big hot loops or small code that fits in the I$ to skew 
that trivial rule.

So at a total guess, and taking the above numbers (that are questionable, 
but hey, they should be ok as a starting point for WAGging), reducing code 
size by 10% should give about 0.007 cycles per instruction in user space. 
Whee. Not very noticeable. But on system code (the only thing the kernel 
can help), it's actually a much more visible 0.05 CPI.

Yeah, the math is bogus, the numbers may be irrelevant, but it does show 
why I$ should matter, even though it's much harder to measure.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux