Andy Isaacson <[email protected]> writes:
> On Sat, Oct 15, 2005 at 06:49:56AM -0600, Eric W. Biederman wrote:
>> Andy Isaacson <[email protected]> writes:
>> > I believe (but have not verified) that GCC simply inhibits
>> > dead-store-elimination when the address of the variable has been taken,
>> > so this theoretical possibility is not a real danger under gcc. And in
>> > any case, it doesn't apply to your check_nmi_watchdog, because you've
>> > got function calls after the assignment.
>>
>> It comes very close to applying to check_nmi_watchdog as both
> [snip]
>
> You prodded me to looking in a bit more depth at the code around this,
> and I'm actually a bit concerned that volatile may not be enough of a
> guarantee that other CPUs will see the correct value. I grant that this
> is a mostly theoretical concern, but let's take a look at the code:
>
> +static __init void nmi_cpu_busy(void *data)
> +{
> + volatile int *endflag = data;
> + local_irq_enable();
> + while (*endflag == 0)
> + barrier();
> +}
> static int __init check_nmi_watchdog(void)
> {
> + volatile int endflag = 0;
> ...
> + if (nmi_watchdog == NMI_LOCAL_APIC)
> + smp_call_function(nmi_cpu_busy, (void *)&endflag, 0, 0);
> ...
> + endflag = 1;
> printk("OK.\n");
> if (nmi_watchdog == NMI_LOCAL_APIC)
> nmi_hz = 1;
> + kfree(prev_nmi_count);
> return 0;
> }
>
> So CPU#0 does a smp_call_function, does some work, then sets endflag,
> does a printk, sets a static variable, calls kfree, then leaves the
> stack frame.
>
> Meanwhile, CPU#1 - CPU#N are polling waiting for endflag to make a
> transition from 0->1.
>
> Nothing that CPU#0 does after setting endflag is a guarantee that its
> store to endflag will be seen by other CPUs. In particular, if the
> caller immediately zeros that stack location (not an unlikely
> happenstance), then it's possible that the two stores to endflag might
> be coalesced by a write buffer in CPU#0's bus interface.
Yes but if they are at all separated in time things are better. And
the constant reads of endflag are going to ping-pong the cache
line between all of the cpus which will tend to get the store pushed
out as soon as possible. No other magic is necessary on x86.
> In particular, a NUMA x86 system is rather likely to be unfair to remote
> nodes in this case (where the other local CPUs are polling too).
In the NUMA systems I am familiar with the system will slow to a crawl
instead of performing and being unfair.
>> If there is a better idiom to synchronize the cpus which does
>> not mark the variable volatile I could see switching to that.
>>
>> There is a theoretical race there as some cpu might not see endflag
>> get set and the next function on the stack could set it that same
>> stack slot to 0. I can see value in fixing that, if there is a simple
>> and clear solution. Currently I am having a failure of imagination.
>
> Exactly (he says; after writing the same thing above I re-read the rest
> of your post).
>
> I would imagine that some kind of write memory barrier is appropriate,
> but I'm not up to date on what's in fashion in the kernel code these
> days.
Probably a counter, to ensure the code exits. The code prints
a message if it fails, and possibly the boot breaks. So I don't expect
bugs in this code to persist for a long time without someone screaming.
Andrew has forwarded me the first bug report already.
http://bugzilla.kernel.org/show_bug.cgi?id=5462
It looks like we have a bug of the watchdog timers instead of one of
the expected failure modes. After I get some more information I will
be able to see what the practical consequences are.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]