Hi,
I found _spin_lock used a LOCK instruction to make the following operation
"decb %0" atomic. As you know, LOCK instruction alone takes almost 70 clock
cycles to finish and this add lots of cost to the _spin_lock. However
_spin_unlock does not use this LOCK instruction and it uses "movb $1,%0"
instead since 4-byte writes on 4-byte aligned addresses are atomic.
So I want rewrite the _spin_lock defined spinlock.h
(/linux/include/asm-i386) as follows to reduce the overhead of _spin_lock
and make it more efficient.
#define spin_lock_string \
"\n1:\t" \
"cmpb $0,%0\n\t" \
"jle 2f\n\t" \
"movb $0, %0\n\t" \
"jmp 3f\n" \
"2:\t" \
"rep;nop\n\t" \
"cmpb $0, %0\n\t" \
"jle 2b\n\t" \
"jmp 1b\n" \
"3:\n\t"
Compared with the original version as follows, LOCK instruction is removed.
I rebuilt the Intel e1000 Gigabit driver with this _spin_lock. There is
about 2% throughput improvement.
#define spin_lock_string \
"\n1:\t" \
"lock ; decb %0\n\t" \
"jns 3f\n" \
"2:\t" \
"rep;nop\n\t" \
"cmpb $0,%0\n\t" \
"jle 2b\n\t" \
"jmp 1b\n" \
"3:\n\t"
Do you think I can get a better performance if I dig further?
Any ideas will be greatly appreciated,
L.Y.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
|
|