Chuck Ebbert <[email protected]> wrote:
>
>
> This patch makes restore_fpu() an inline. When L1/L2 cache are saturated
> it makes a measurable difference.
>
> Results from profiling Volanomark follow. Sample rate was 2000 samples/sec
> (HZ = 250, profile multiplier = 8) on a dual-processor Pentium II Xeon.
>
>
> Before:
>
> 10680 restore_fpu 333.7500
> 8351 device_not_available 203.6829
> 3823 math_state_restore 59.7344
> -----
> 22854
>
>
> After:
>
> 12534 math_state_restore 130.5625
> 8354 device_not_available 203.7561
> -----
> 20888
>
>
> Patch is "obviously correct" and cuts 9% of the overhead. Please apply.
hm. What context switch rate is that thing doing?
Is the benchmark actually doing floating point stuff?
We do have the `used_math' optimisation in there which attempts to avoid
doing the FP save/restore if the app isn't actually using math. But
<ancient recollections> there's code in glibc startup which always does a
bit of float, so that optimisation is always defeated. There was some
discussion about periodically setting tasks back into !used_math state to
try to restore the optimisation for tasks which only do a little bit of FP,
but nothing actually got done.
> Next step should be to physically place math_state_restore() after
> device_not_available(). Would such a patch be accepted? (Yes it
> would be ugly and require linker script changes.)
Depends on the benefit/ugly ratio ;)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
|
|