Re: [PATCH] speed up on find_first_bit for i386 (let compiler do the work)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, 29 Jul 2005, David Woodhouse wrote:
>
> On Thu, 2005-07-28 at 10:25 -0700, Linus Torvalds wrote:
> > Basic rule: inline assembly is _better_ than random compiler extensions. 
> > It's better to have _one_ well-documented extension that is very generic 
> > than it is to have a thousand specialized extensions.
> 
> Counterexample: FR-V and its __builtin_read8() et al.

There are arguably always counter-examples, but your arguments really are 
pretty theoretical.

Very seldom does compiler extensions end up being (a) timely enough and 
(b) semantically close enough to be really useful.

> Builtins can also allow the compiler more visibility into what's going
> on and more opportunity to optimise.

Absolutely. In theory. In practice, not so much. All the opportunity to 
optimize often ends up being lost in semantic clashes, or just because 
people can't use the extension because it hasn't been there since day one.

The fact is, inline asms are pretty rare even when we are talking about
every single possible assembly combination. They are even less common when
we're talking about just _one_ specific case of them (like something like
__builtin_ffs()).

What does this mean? It has two results: (a) instruction-level scheduling 
and register allocation just isn't _that_ important, and the generic "asm" 
register scheduling is really plenty good enough. The fact that in theory 
you might get better results if the compiler knew exactly what was going 
on is just not relevant: in practice it's simply not _true_. The other 
result is: (b) the compiler people don't end up seeing something like the 
esoteric builtins as a primary thing, so it's not like they'd be tweaking 
and regression-testing everything _anyway_.

So I argue very strongly that __builtin_xxx() is _wrong_, unless you have 
very very strong reasons for it:

 - truly generic and _very_ important stuff: __builtin_memcpy() is
   actually very much worth it, since it's all over, and it's so generic 
   that the compiler has a lot of choice in how to do it.

 - stuff where the architecture (or the compiler) -really- sucks with
   inline asms, and has serious problems, and the thing is really 
   important. Your FR-V example _might_ fall into this category (or it 
   might not), and ia64 has the problem with instruction packing and
   scheduling and so __builtin's have a bigger advantage.

Basically, on most normal architectures, there's seldom any reason at
_all_ to use builtins except for things like memcpy. On x86, I think the
counter-example might be if you want to schedule MMX code from C - which
is a special case because it doesn't follow my "rule (a)" above. But we 
don't do that in the kernel, really, or we just schedule it out-of-line.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux