* Ingo Molnar <[email protected]> wrote:
> > > > > This change causes the memory access of the "easy" spin-loop
> > > > > portion to be more agressive: after the REP; NOP we'd not do
> > > > > the 'easy-loop' with a simple CMPB, but we'd re-attempt the
> > > > > atomic op.
> > > >
> > > > It looks as if this is going to overflow of the lock counter,
> > > > no?
> > >
> > > hm, what do you mean? There's no lock counter.
> >
> > I mean, the repeated calls to decb will pretty soon make lock->slock
> > wrap around.
>
> ugh, indeed, bad thinko on my part. I'll rework this.
how about the patch below? Boot-tested on 32-bit. As a side-effect this
change also removes the 255 CPUs limit from the 32-bit kernel.
Ingo
------------------------->
Subject: [patch] x86: fix spin-loop starvation bug
From: Ingo Molnar <[email protected]>
Miklos Szeredi reported very long pauses (several seconds, sometimes
more) on his T60 (with a Core2Duo) which he managed to track down to
wait_task_inactive()'s open-coded busy-loop. He observed that an
interrupt on one core tries to acquire the runqueue-lock but does not
succeed in doing so for a very long time - while wait_task_inactive() on
the other core loops waiting for the first core to deschedule a task
(which it wont do while spinning in an interrupt handler).
The problem is: both the spin_lock() code and the wait_task_inactive()
loop uses cpu_relax()/rep_nop(), so in theory the CPU should have
guaranteed MESI-fairness to the two cores - but that didnt happen: one
of the cores was able to monopolize the cacheline that holds the
runqueue lock, for extended periods of time.
This patch changes the spin-loop to assert an atomic op after every REP
NOP instance - this will cause the CPU to express its "MESI interest" in
that cacheline after every REP NOP.
Signed-off-by: Ingo Molnar <[email protected]>
---
include/asm-i386/spinlock.h | 27 ++++++++++-----------------
include/asm-x86_64/spinlock.h | 33 ++++++++++++++++-----------------
2 files changed, 26 insertions(+), 34 deletions(-)
Index: linux-cfs-2.6.22-rc5.q/include/asm-i386/spinlock.h
===================================================================
--- linux-cfs-2.6.22-rc5.q.orig/include/asm-i386/spinlock.h
+++ linux-cfs-2.6.22-rc5.q/include/asm-i386/spinlock.h
@@ -35,15 +35,12 @@ static inline int __raw_spin_is_locked(r
static inline void __raw_spin_lock(raw_spinlock_t *lock)
{
asm volatile("\n1:\t"
- LOCK_PREFIX " ; decb %0\n\t"
- "jns 3f\n"
- "2:\t"
- "rep;nop\n\t"
- "cmpb $0,%0\n\t"
- "jle 2b\n\t"
+ LOCK_PREFIX " ; btrl %[zero], %[slock]\n\t"
+ "jc 3f\n"
+ "rep; nop\n\t"
"jmp 1b\n"
"3:\n\t"
- : "+m" (lock->slock) : : "memory");
+ : [slock] "+m" (lock->slock) : [zero] "Ir" (0) : "memory");
}
/*
@@ -59,27 +56,23 @@ static inline void __raw_spin_lock_flags
{
asm volatile(
"\n1:\t"
- LOCK_PREFIX " ; decb %[slock]\n\t"
+ LOCK_PREFIX " ; btrl %[zero], %[slock]\n\t"
"jns 5f\n"
"2:\t"
"testl $0x200, %[flags]\n\t"
"jz 4f\n\t"
STI_STRING "\n"
- "3:\t"
- "rep;nop\n\t"
- "cmpb $0, %[slock]\n\t"
- "jle 3b\n\t"
+ "rep; nop\n\t"
CLI_STRING "\n\t"
"jmp 1b\n"
"4:\t"
- "rep;nop\n\t"
- "cmpb $0, %[slock]\n\t"
- "jg 1b\n\t"
+ "rep; nop\n\t"
"jmp 4b\n"
"5:\n\t"
: [slock] "+m" (lock->slock)
- : [flags] "r" (flags)
- CLI_STI_INPUT_ARGS
+ : [zero] "Ir" (0),
+ [flags] "r" (flags)
+ CLI_STI_INPUT_ARGS
: "memory" CLI_STI_CLOBBERS);
}
#endif
Index: linux-cfs-2.6.22-rc5.q/include/asm-x86_64/spinlock.h
===================================================================
--- linux-cfs-2.6.22-rc5.q.orig/include/asm-x86_64/spinlock.h
+++ linux-cfs-2.6.22-rc5.q/include/asm-x86_64/spinlock.h
@@ -26,14 +26,15 @@ static inline void __raw_spin_lock(raw_s
{
asm volatile(
"\n1:\t"
- LOCK_PREFIX " ; decl %0\n\t"
+ LOCK_PREFIX " ; btrl %[zero], %[slock]\n\t"
"jns 2f\n"
- "3:\n"
- "rep;nop\n\t"
- "cmpl $0,%0\n\t"
- "jle 3b\n\t"
+ "rep; nop\n\t"
"jmp 1b\n"
- "2:\t" : "=m" (lock->slock) : : "memory");
+ "2:\t"
+ : [slock] "+m" (lock->slock)
+ : [zero] "Ir" (0)
+ : "memory"
+ );
}
/*
@@ -44,24 +45,22 @@ static inline void __raw_spin_lock_flags
{
asm volatile(
"\n1:\t"
- LOCK_PREFIX " ; decl %0\n\t"
+ LOCK_PREFIX " ; btrl %[zero], %[slock]\n\t"
"jns 5f\n"
- "testl $0x200, %1\n\t" /* interrupts were disabled? */
+ "testl $0x200, %[flags]\n\t" /* were interrupts disabled? */
"jz 4f\n\t"
"sti\n"
- "3:\t"
- "rep;nop\n\t"
- "cmpl $0, %0\n\t"
- "jle 3b\n\t"
+ "rep; nop\n\t"
"cli\n\t"
"jmp 1b\n"
- "4:\t"
- "rep;nop\n\t"
- "cmpl $0, %0\n\t"
- "jg 1b\n\t"
+ "rep; nop\n\t"
"jmp 4b\n"
"5:\n\t"
- : "+m" (lock->slock) : "r" ((unsigned)flags) : "memory");
+ : [slock] "+m" (lock->slock)
+ : [zero] "Ir" (0),
+ [flags] "r" ((unsigned)flags)
+ : "memory"
+ );
}
#endif
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]