On Fri, Jan 12, 2007 at 09:59:40AM +0000, Alan wrote:
> On Fri, 12 Jan 2007 07:02:13 +0100
> Nick Piggin <[email protected]> wrote:
> > Just noticed this while looking at a bug.
> > Avoid an expensive integer divide 3 times per CPU per tick.
> Integer divide is cheap on some modern processors, and multibit shift
> isn't on all embedded ones.

Well integer divide unit is non-pipelined on P4 K8 Core2 and probably
most processors, AFAIK. So the 3 divs would take 240 cycles on a P4,

> How about putting back scale = 1 and using
> scale += scale;
> instead of the shift and getting what ought to be even better results

Yes I gues we ccan do this as well, good idea. I'll make a
quick userspace benchmark and post some numbers with my next
