Re: 2.6.12 hangs on boot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Friday 29 July 2005 09:09, Andrew Morton wrote:
> "Alexander Y. Fomichev" <[email protected]> wrote:
> > G' day
> >
> > I've been trying to switch from 2.6.12-rc3 to 2.6.12 on Dual EM64T 2.8
> > GHz [ MoBo: Intel E7520, intel 82801 ]
> > but kernel hangs on boot right after records:
> >
> > Booting processor 2/1 rip 6000 rsp ffff8100023dbf58
> > Initializing CPU#2
> >
> > ( below is a link to full boot trace, actually -git3 but no differences)
> > http://sysadminday.org.ru/2.6.12-hang-on-boot/2.6.12-git3-hang
> >
> > An attempt to enable debug:
> > +CONFIG_ACPI_DEBUG=y
> > +CONFIG_DEBUG_SLAB=y
> > +CONFIG_DEBUG_PREEMPT=y
> > +CONFIG_DEBUG_SPINLOCK=y
> > +CONFIG_DEBUG_SPINLOCK_SLEEP=y
> > +CONFIG_DEBUG_KOBJECT=y
> > +CONFIG_DEBUG_INFO=y
> > +CONFIG_INIT_DEBUG=y
> > gives rather strange result, kernel boots successfully ( with a lot of
> > debuging messages of course but i couldn't find something suspicious )
> > http://sysadminday.org.ru/2.6.12-hang-on-boot/2.6.12-git3-debug
> >
> > config for 2.6.12 have been taken from previous one, only
> > 'make oldconfig' has been made.
> > http://sysadminday.org.ru/2.6.12-hang-on-boot/2.6.12-git3.config
> >
> > Hang 100% reproducible on at least two of my EM64T hosts.
> > ( actualy the same configuration as of MoBo/CPU )
>
> Is this still happening in 2.6.13-rc4?
>
> If so, could you please test 2.6.13-rc4 plus the below fix?
>
> Thanks.
>
>
> From: [email protected] (Eric W. Biederman)
>
> sync_tsc was using smp_call_function to ask the boot processor to report
> it's tsc value.  smp_call_function performs an IPI_send_allbutself which is
> a broadcast ipi.  There is a window during processor startup during which
> the target cpu has started and before it has initialized it's interrupt
> vectors so it can properly process an interrupt.  Receveing an interrupt
> during that window will triple fault the cpu and do other nasty things.
>
> Why cli does not protect us from that is beyond me.
>
> The simple fix is to match ia64 and provide a smp_call_function_single.
> Which avoids the broadcast and is more efficient.
>
> This certainly fixes the problem of getting stuck on boot which was very
> easy to trigger on my SMP Hyperthreaded Xeon, and I think it fixes it for
> the right reasons.
>
> I believe this patch suffers from apicid versus logical cpu number
> confusion.  I copied the basic logic from smp_send_reschedule and I can't
> find where that translates from the logical cpuid to apicid.  So it isn't
> quite correct yet.  It should be close enough that it shouldn't be too hard
> to finish it up.
>
> More bug fixes after I have slept but I figured I needed to get this
> one out for review.
>
> Signed-off-by: Eric ic W. Biederman <[email protected]>
> Signed-off-by: Andrew Morton <[email protected]>
> ---
[skip]

I've not tried 2.6.13-rc4 itself because i notice changes has been commited
into Linus git tree under id: 3d483f47579461a4715db33c68ef8752e5a97a2d
http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=3d483f47579461a4715db33c68ef8752e5a97a2d
and this tree works well for me though previous one
[94d2ac66c12397e2ca7988dbf59f24a966d275cb] -- hangs. So i guess it is exactly
problem this patch solve.
Thank you and for your help.


-- 
Best regards.
        Alexander Y. Fomichev <[email protected]>
        Public PGP key: http://sysadminday.org.ru/gluk.asc
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux