Re: [Fastboot] [PATCH] i386: move apic init in init_IRQs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Vivek Goyal <[email protected]> writes:

> Hi Eric,
>
> I had a couple of observations.
>
> [..]
>>  #ifdef CONFIG_X86_IO_APIC
>>  	{
>> @@ -1046,9 +1050,11 @@ static unsigned int calibration_result;
>>  
>>  void __init setup_boot_APIC_clock(void)
>>  {
>> +	unsigned long flags;
>>  	apic_printk(APIC_VERBOSE, "Using local APIC timer interrupts.\n");
>>  	using_apic_timer = 1;
>>  
>> +	local_irq_save(flags);
>>  	local_irq_disable();
>>  
>
> Should the local_irq_disable() call go away onece local_irq_save() got
> introduced.

Nope.  The irqs need to be disabled.  The save just allows this
to be called in a context where irqs start out disabled.  It is
just a save.

>> +	/*
>> +	 * Should not be necessary because the MP table should list the boot
>> +	 * CPU too, but we do it for the sake of robustness anyway.
>> +	 * Makes no sense to do this check in clustered apic mode, so skip it
>> +	 */
>> +	if (!check_phys_apicid_present(boot_cpu_physical_apicid)) {
>> +		printk("weird, boot CPU (#%d) not listed by the BIOS.\n",
>> +				boot_cpu_physical_apicid);
>
>
> I am testing kdump on i386 and I am hitting this message while second kernel
> is booting. I am doing testing with 2.6.14-rc4-mm1. Logs are pasted below.

The check has been there for a while.  All it is saying is that
our boot cpu has apicid #1.   So I suspect you are either on
an Opteron system or a hyperthreaded Xeon system.

> Also kdump testing fails almost 50% of the time on my machine with
> 2.6.14-rc4-mm1.  It works fine with 2.6.14-rc4 though.

Is the failure that happens 50% represented by the bootlog below?

The problem bootlog appears to be a glitch in the handling
of apicids on the boot cpu that the BIOS does not report to the
kernel.

> Second kernel is unable to come up. earlyprintk on serial console showed
> a kernel BUG in setup_local_APIC(). Details are included in the logs below.

> Second kernel boot log.

The BUG is weird.  I don't think apic.c even goes to line 1479.
Unless the BUG is inline in one of the other functions called
by setup_local_APIC() .

	/*
	 * Double-check whether this APIC is really registered.
	 */
	if (!apic_id_registered())
		BUG();


apic_id_registered expands to:
static inline int apic_id_registered(void)
{
	return physid_isset(GET_APIC_ID(apic_read(APIC_ID)), phys_cpu_present_map);
}

Which indicates to me that the code that, there is something
wrong in the logic of:
	if (!check_phys_apicid_present(boot_cpu_physical_apicid)) {
		printk("weird, boot CPU (#%d) not listed by the BIOS.\n",
				boot_cpu_physical_apicid);
		physid_set(hard_smp_processor_id(), phys_cpu_present_map);
	}

Currently we are refering to the boot cpus apicid with 3 different expressions
one of them appears to be wrong.

That is as far as I can get at the moment.

Eric


>
> # SysRq : Trigger a crashdump
> I'm in purgatory
> Linux version 2.6.14-rc4-mm1-16M ([email protected]) (gcc version 3.4.3
> 20041212 (Red Hat 3.4.3-9.EL4)) #1 PREEMPT Wed Oct 19 13:55:24 IST 2005
> BIOS-provided physical RAM map:
>  BIOS-e820: 0000000000000100 - 000000000009d000 (usable)
>  BIOS-e820: 000000000009d000 - 00000000000a0000 (reserved)
>  BIOS-e820: 0000000000100000 - 000000002fffa480 (usable)
>  BIOS-e820: 000000002fffa480 - 0000000030000000 (ACPI data)
>  BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
> user-defined physical RAM map:
>  user: 0000000000000000 - 00000000000a0000 (usable)
>  user: 0000000001000000 - 000000000142d000 (usable)
>  user: 00000000014cd400 - 0000000004000000 (usable)
> 0MB HIGHMEM available.
> 64MB LOWMEM available.
> found SMP MP-table at 0009e140
> early console enabled
> DMI 2.1 present.
> ACPI: LAPIC (acpi_id[0x00] lapic_id[0x03] enabled)
> Processor #3 6:10 APIC version 17
> ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
> Processor #0 6:10 APIC version 17
> WARNING: NR_CPUS limit of 1 reached.  Processor ignored.
> ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
> Processor #1 6:10 APIC version 17
> WARNING: NR_CPUS limit of 1 reached.  Processor ignored.
> ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled)
> Processor #2 6:10 APIC version 17
> WARNING: NR_CPUS limit of 1 reached.  Processor ignored.
> ACPI: IOAPIC (id[0x0e] address[0xfec00000] gsi_base[0])
> IOAPIC[0]: apic_id 14, version 17, address 0xfec00000, GSI 0-15
> ACPI: IOAPIC (id[0x0d] address[0xfec01000] gsi_base[16])
> IOAPIC[1]: apic_id 13, version 17, address 0xfec01000, GSI 16-31
> ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
> Enabling APIC mode:  Flat.  Using 2 I/O APICs
> Using ACPI (MADT) for SMP configuration information
> Allocating PCI resources starting at 10000000 (gap: 04000000:fc000000)
> Built 1 zonelists
> Initializing CPU#0
> Kernel command line: ro root=/dev/sda7 rhgb console=ttyS0,38400 irqpoll init 3
> earlyprintk=ttyS0,38400 memmap=exactmap memmap=640K@0K memmap=4276K@16384K
> memmap=44235K@21301K elfcorehdr=21300K
> Misrouted IRQ fixup and polling support enabled
> This may significantly impact system performance
> weird, boot CPU (#1) not listed by the BIOS.
> ------------[ cut here ]------------
> kernel BUG at ÿÿÿÿ:1479!
> invalid operand: 0000 [#1]
> PREEMPT
> last sysfs file:
> Modules linked in:
> CPU:    0
> EIP:    0060:[<c1012b17>]    Not tainted VLI
> EFLAGS: 00010046   (2.6.14-rc4-mm1-16M)
> EIP is at setup_local_APIC+0x26/0x18c
> eax: 00000000   ebx: 00040011   ecx: 00000c06   edx: 00000000
> esi: 00000011   edi: c13a9800   ebp: 01445007   esp: c13b5fbc
> ds: 007b   es: 007b   ss: 0068
> Process swapper (pid: 0, threadinfo=c13b4000 task=c133faa0)
> Stack: c13a9800 01445007 c101ac30 00000000 01429900 c13c1c49 c12e8a4c 00000001
>        00000003 c13b66cf c12e5d5d c13eddc0 c133b9fc 00000078 c13b6342 c13eddc0
>        c1000199
> Call Trace:
>  [<c101ac30>] printk+0x17/0x1b
>  [<c13c1c49>] APIC_init+0x5a/0xf6
>  [<c13b66cf>] start_kernel+0xb3/0x1cd
>  [<c13b6342>] unknown_bootoption+0x0/0x1b6
> Code: e4 f7 0f 30 c3 56 53 83 ec 0c 8b 1d 30 d0 ff ff a1 20 d0 ff ff c1 e8 18 0f
> b6 f3 83 e0 0f 0f a3 05 e0 03 3f c1 19 c0 85 c0 75 02 <0f> 0b c7 05 e0 d0 ff ff
> ff ff ff ff 8b 0d c4 03 3f c1 a1 d0 d0
>  <0>Kernel panic - not syncing: Attempted to kill the idle task!
>
>
> Thanks
> Vivek
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux