Re: NMI problems with Dell SMP Xeons

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andi Kleen (on Wed, 7 Jun 2006 09:20:23 +0200) wrote:
>On Wednesday 07 June 2006 06:49, Keith Owens wrote:
>> Following a suggestion by Brendan Trotter, I ran some more tests to
>> track down the problem with sending NMI IPI on Dell Xeons.
>>
>> BIOS Logical    OS ACPI     Cpus    IPI 2             NMI IPI
>>  Processor                BIOS  OS                 (APIC_DM_NMI)
>>
>> Enabled         Enabled    4    4  Not delivered   Delivered as NMI
>> Enabled         Disabled   4    2  Machine reset   Machine reset
>> Disabled        Enabled    2    2  Not delivered   Delivered as NMI
>> Disabled        Disabled   2    2  Not delivered   Delivered as NMI
>>
>> So the killer combination with this motherboard is when the BIOS knows
>> about logical processors but the OS does not.  Sending IPI 2 or NMI IPI
>> with that combination kills the machine.  Brendan suggested that the
>> BIOS is seeing the broadcast NMI on the logical processors which are
>> not under OS control and that the BIOS cannot cope.
>
>How did you manage that? Normally the OS should use all CPUs
>known to BIOS. Or did you boot with special boot options to limit it?

Two ways:

(1) Boot with a kernel with CONFIG_ACPI=n, so the OS only finds 2 cpus
    in the MPT instead of the 4 listed by ACPI.

(2) The kernel has ACPI=y, but is booted with maxcpus=2.

In both cases, send_IPI_allbutself() with IPI 2 or an NMI will result
in a hard reset.

>> Should we change the x86_64 send_IPI_allbutself() so it is only
>> delivered to cpus that the OS knows about, instead of doing a general
>> broadcast. 
>
>Hmm, we should be doing that already to avoid races for CPU hotplug.  But 
>maybe it's not working correctly for KDB.

This problem is not KDB specific, although that is where it was first
noticed.  Any code that sends a broadcast IPI 2 or an NMI IPI will
crash these Dell boxes when there is a mismatch between the cpus known
to the BIOS and the cpus known to the OS.

>Does it go away when you
>enable CPU hotplug?

HOTPLUG_CPU was already on in all of my test kernels.

>Anyways, should be a SMOP to force it. I wouldn't
>have a problem to use sequence ipis  always and get rid of the broadcasts.
>There were benchmarks at some point and there wasn't a noticeable
>difference. 

I will try forcing send_IPI_allbutself() to use the mask version rather
than the broadcast shortcut.  Later tonight ...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux