Re: v2.6.21.5-rt19 (sched_getaffinity?)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2007-07-08 at 20:53 -0700, Fernando Pablo Lopez-Lezcano wrote: 
> On Mon, 9 Jul 2007, Gabriel C wrote:
> > Fernando Lopez-Lezcano wrote:
> >> On Sat, 2007-07-07 at 11:24 +0200, Ingo Molnar wrote:
> >>> * Fernando Lopez-Lezcano <[email protected]> wrote:
> >>>>>> Changes since 2.6.21.5-rt18:
> >>>>>> - Fixed a nasty and hard to track down slowness / boot problem on SMP
> >>>>>> machines with CONFIG_NOHZ enabled. The problem was caused by the timer
> >>>>>> wheel base lock held during the get_next_timer_interrupt() call in the
> >>>>>> idle path, which eventually led to a bogus PI boosting of the idle task
> >>>>>> and in consequence a stale wrong scheduler selection for the affected 
> >>>>>> idle
> >>>>>> task.
> >>>>>> 
> >>>>>> Kudos to Carsten Emde, who patiently and meticulously isolated the
> >>>>>> problem and provided the traces, which allowed to identify the root 
> >>>>>> cause.
> >>>>>> 
> >>>>>> Problem solution: Prevent idle task boosting
> >>>>>> 
> >>>>> Maybe someone remember me whining about troubles with 2.6.21-rt2..18 on 
> >>>>> my Core2 T7200 laptop (fujitsu-siemens amilo i1520).
> >>>>> 
> >>>>> Althought I'm still with my fingers crossed, I can tell the good news 
> >>>>> are that 2.6.21.5-rt19 (and -rt20) does behave far better now on the 
> >>>>> very same box.
> >>>>> 
> >>>> Yes, it works much better indeed...
> >>>> 
> >>>> Ingo: is there a place where I can read about the changes in different 
> >>>> rtxx releases? What is new/better/fixed in rt20? (I see scheduler stuff 
> >>>> in a diff from rt19 to rt20 but I don't really know what it means).
> >>>> 
> >>> and rt18 was a -rt-only NOHZ fix, that bug got introduced in rt11 when CFS 
> >>> was merged.
> >>> 
> >>> i _think_ Rui might have seen two separate problems. Perhaps by the time 
> >>> we fixed the first problem (which Rui saw since -rt2) we introduced the 
> >>> other one via -rt11 - which then got fixed in -rt19.
> >> 
> >> Ahh, CFS is now part of rt, I was obviously not paying attention... I'm
> >> really trying to provide a "stable" rt kernel for audio usage and
> >> including another subsystem into rt is - IMHO - not going to help.
> >> What's the chance of splitting things?
> >> 
> >>> btw., we'd love to get more feedback regarding CFS. CFS is a completely 
> >>> new scheduler for Linux. 
> >> 
> >> Then I'd rather have it separate from rt.
> >> 
> >>> It has a design centered around keeping application latencies down, so it 
> >>> is ultimately real-time friendly, and it should also make things work 
> >>> better for desktop-ish and audio-ish stuff as well. (even under 
> >>> SCHED_OTHER)
> >>> 
> >> 
> >> Maybe this is CFS related? (tail of a thread in the Planet CCRMA mailing
> >> list):
> >> 
> >> On Sun, 2007-07-08 at 15:26 -0400, Hector Centeno wrote:
> >> 
> >>> Ok, so just to confirm, that 2.6.21-0182.rt19.1.fc7.ccrmart works fine
> >>> on my desktop but on my laptop it makes Firefox and Tomboy to crash.
> >>> On the same laptop using 2.6.21-0182.rt17.1.fc7.ccrmart there is no
> >>> problem.
> >>> 
> >> I managed to completely hang firefox (fc7) with flash 9 installed
> >> (unkillable even with -9).
> >
> > Firefox with flash 9 does not work good , there are a lot bugs reported 
> > about ( just google ) and it hangs on vanilla or whatever other kernels 
> > as well. Not only Firefox but also Swiftfox, Opera, Epiphany etc.
> >
> > The most time Firefox dies when you use flash 9 and close a window or a 
> > tab.
> 
> More tests...
> 
> The problem is the rt kernel AFAICT, this goes beyond Flash 9, way 
> beyond:
> 
> _OpenOffice_ hangs with 2.6.21.5-rt20, works fine with stock Fedora 7 
> kernel. Flash 9 hangs with 2.6.21.5-rt20, works fine with the stock Fedora 
> 7 kernel. Same machine booting different kernels, I'd say it is the 
> kernel.
> 
> The only way out for a hung app is a reboot.
> 
> Ingo: what would be a good way to trace this? It makes the rt kernels not 
> very usable at least on this hardware (more tests tomorrow in the CCRMA 
> machines).
> 
> Same on 2.6.21.5-rt18 with CONFIG_NO_HZ not set.

I forgot to include the output of strace... and of course now I can't
repeat the openoffice hang. 

I do get flash 9 (I know, not the best example) and tomboy to hang as
reported by one of my Planet CCRMA users - flash 9 tested working on
stock fedora 7 kernel - and both seem to hang in the same system call:

sched_getaffinity(3528, 32,  <unfinished ...>

Full output of strace attached for both cases. 

Hopefully this will make the bug immediately obvious to someone :-)
[running on a laptop with the 7700 Intel cpu]
-- Fernando

Attachment: firefox.trace.gz
Description: GNU Zip compressed data

Attachment: tomboy.trace.gz
Description: GNU Zip compressed data


[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux