Re: About a change to the implementation of spin lock in 2.6.12 kernel.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Wed, Jul 13, 2005 at 07:20:06PM -0700, [email protected] wrote:
> Hi,
> 
> I found _spin_lock used a LOCK instruction to make the following 
> operation "decb %0" atomic. As you know, LOCK instruction alone takes 
> almost 70 clock cycles to finish and this add lots of cost to the 
> _spin_lock. However _spin_unlock does not use this LOCK instruction and 
> it uses "movb $1,%0" instead since 4-byte writes on 4-byte aligned 
> addresses are atomic.

_spin_unlock does not need locked operations because when it is run, the
code is already known to be the only one to hold the lock, so it can
release it without checking what others do.

> So I want rewrite the _spin_lock defined spinlock.h 
> (/linux/include/asm-i386) as follows to reduce the overhead of _spin_lock 
> and make it more efficient.

It does not work. You cannot write an inter-cpu atomic test-and-set with
several unlocked instructions.

> #define spin_lock_string \
>        "\n1:\t" \
>        "cmpb $0,%0\n\t" \
>        "jle 2f\n\t" \

==> here, another thread or CPU can get the lock simultaneously.

>        "movb $0, %0\n\t" \
>        "jmp 3f\n" \
>        "2:\t" \
>        "rep;nop\n\t" \
>        "cmpb $0, %0\n\t" \
>        "jle 2b\n\t" \
>        "jmp 1b\n" \
>        "3:\n\t"
> 
> Compared with the original version as follows, LOCK instruction is 
> removed. I rebuilt the Intel e1000 Gigabit driver with this _spin_lock. 
> There is about 2% throughput improvement.
> #define spin_lock_string \
>            "\n1:\t" \
>            "lock ; decb %0\n\t" \
>            "jns 3f\n" \
>            "2:\t" \
>            "rep;nop\n\t" \
>            "cmpb $0,%0\n\t" \
>            "jle 2b\n\t" \
>            "jmp 1b\n" \
>            "3:\n\t"
> 
> Do you think I can get a better performance if I dig further?
> 
> Any ideas will be greatly appreciated,

well, of course with those methods you can improve performance, but you
lose the warranty that you're alone to get a lock, and that's bad.

another similar method to get a lock in some very controlled environment
is as follows :

  1:  cmp $0, %0
      jne 1b
      mov $CPUID, %0
      membar
      cmp $CPUID, %0
      jne 1b

This only works with same speed CPUs and interrupts disabled. But in todays
environments, this is very risky (hyperthreaded CPUs, etc...). However, this
is often OK for more deterministic CPUs such as microcontrollers.

Regards,
Willy

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux