Hello all,
I discovered that the delay loop used there (which simply found a new place
in 2.6.16-mm1; it existed much longer) is inefficient,
since it does an unnecessary subtraction *in the main loop* which also hits
asm code:
00000024 <delay_tsc>:
24: 53 push %ebx
25: 89 c3 mov %eax,%ebx
27: 0f 31 rdtsc
29: 89 c1 mov %eax,%ecx
2b: f3 90 pause
2d: 0f 31 rdtsc
2f: 29 c8 sub %ecx,%eax
31: 39 d8 cmp %ebx,%eax
33: 72 f6 jb 2b <delay_tsc+0x7>
35: 5b pop %ebx
36: c3 ret
With such a patch:
--- linux-2.6.16-mm1/arch/i386/lib/delay.c.orig 2006-03-23 12:52:45.000000000 +0100
+++ linux-2.6.16-mm1/arch/i386/lib/delay.c 2006-03-24 12:53:01.000000000 +0100
@@ -44,10 +44,12 @@
unsigned long bclock, now;
rdtscl(bclock);
+ /* offset with bclock to have very simple comparison below */
+ loops += bclock;
do {
rep_nop();
rdtscl(now);
- } while ((now-bclock) < loops);
+ } while (now < loops);
}
/*
the result is:
00000024 <delay_tsc>:
24: 89 c1 mov %eax,%ecx
26: 0f 31 rdtsc
28: 01 c1 add %eax,%ecx
2a: f3 90 pause
2c: 0f 31 rdtsc
2e: 39 c8 cmp %ecx,%eax
30: 72 f8 jb 2a <delay_tsc+0x6>
32: c3 ret
Improvement: no unnecessary stuff after having hit the timer target value
(read: faster), better power saving, 4 bytes less opcodes.
The patch above could be considered weird since it fiddles with "loops"
directly. A possibly cleaner way would be to introduce a new variable
end = bclock + loops.
Comments? Anything that I'm missing here as to why it hasn't been done that
way before?
If this holds water then I'll submit a final patch soon.
Thanks!
Andreas Mohr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]