On Sun, 2007-07-08 at 20:53 -0700, Fernando Pablo Lopez-Lezcano wrote: > On Mon, 9 Jul 2007, Gabriel C wrote: > > Fernando Lopez-Lezcano wrote: > >> On Sat, 2007-07-07 at 11:24 +0200, Ingo Molnar wrote: > >>> * Fernando Lopez-Lezcano <[email protected]> wrote: > >>>>>> Changes since 2.6.21.5-rt18: > >>>>>> - Fixed a nasty and hard to track down slowness / boot problem on SMP > >>>>>> machines with CONFIG_NOHZ enabled. The problem was caused by the timer > >>>>>> wheel base lock held during the get_next_timer_interrupt() call in the > >>>>>> idle path, which eventually led to a bogus PI boosting of the idle task > >>>>>> and in consequence a stale wrong scheduler selection for the affected > >>>>>> idle > >>>>>> task. > >>>>>> > >>>>>> Kudos to Carsten Emde, who patiently and meticulously isolated the > >>>>>> problem and provided the traces, which allowed to identify the root > >>>>>> cause. > >>>>>> > >>>>>> Problem solution: Prevent idle task boosting > >>>>>> > >>>>> Maybe someone remember me whining about troubles with 2.6.21-rt2..18 on > >>>>> my Core2 T7200 laptop (fujitsu-siemens amilo i1520). > >>>>> > >>>>> Althought I'm still with my fingers crossed, I can tell the good news > >>>>> are that 2.6.21.5-rt19 (and -rt20) does behave far better now on the > >>>>> very same box. > >>>>> > >>>> Yes, it works much better indeed... > >>>> > >>>> Ingo: is there a place where I can read about the changes in different > >>>> rtxx releases? What is new/better/fixed in rt20? (I see scheduler stuff > >>>> in a diff from rt19 to rt20 but I don't really know what it means). > >>>> > >>> and rt18 was a -rt-only NOHZ fix, that bug got introduced in rt11 when CFS > >>> was merged. > >>> > >>> i _think_ Rui might have seen two separate problems. Perhaps by the time > >>> we fixed the first problem (which Rui saw since -rt2) we introduced the > >>> other one via -rt11 - which then got fixed in -rt19. > >> > >> Ahh, CFS is now part of rt, I was obviously not paying attention... I'm > >> really trying to provide a "stable" rt kernel for audio usage and > >> including another subsystem into rt is - IMHO - not going to help. > >> What's the chance of splitting things? > >> > >>> btw., we'd love to get more feedback regarding CFS. CFS is a completely > >>> new scheduler for Linux. > >> > >> Then I'd rather have it separate from rt. > >> > >>> It has a design centered around keeping application latencies down, so it > >>> is ultimately real-time friendly, and it should also make things work > >>> better for desktop-ish and audio-ish stuff as well. (even under > >>> SCHED_OTHER) > >>> > >> > >> Maybe this is CFS related? (tail of a thread in the Planet CCRMA mailing > >> list): > >> > >> On Sun, 2007-07-08 at 15:26 -0400, Hector Centeno wrote: > >> > >>> Ok, so just to confirm, that 2.6.21-0182.rt19.1.fc7.ccrmart works fine > >>> on my desktop but on my laptop it makes Firefox and Tomboy to crash. > >>> On the same laptop using 2.6.21-0182.rt17.1.fc7.ccrmart there is no > >>> problem. > >>> > >> I managed to completely hang firefox (fc7) with flash 9 installed > >> (unkillable even with -9). > > > > Firefox with flash 9 does not work good , there are a lot bugs reported > > about ( just google ) and it hangs on vanilla or whatever other kernels > > as well. Not only Firefox but also Swiftfox, Opera, Epiphany etc. > > > > The most time Firefox dies when you use flash 9 and close a window or a > > tab. > > More tests... > > The problem is the rt kernel AFAICT, this goes beyond Flash 9, way > beyond: > > _OpenOffice_ hangs with 2.6.21.5-rt20, works fine with stock Fedora 7 > kernel. Flash 9 hangs with 2.6.21.5-rt20, works fine with the stock Fedora > 7 kernel. Same machine booting different kernels, I'd say it is the > kernel. > > The only way out for a hung app is a reboot. > > Ingo: what would be a good way to trace this? It makes the rt kernels not > very usable at least on this hardware (more tests tomorrow in the CCRMA > machines). > > Same on 2.6.21.5-rt18 with CONFIG_NO_HZ not set. I forgot to include the output of strace... and of course now I can't repeat the openoffice hang. I do get flash 9 (I know, not the best example) and tomboy to hang as reported by one of my Planet CCRMA users - flash 9 tested working on stock fedora 7 kernel - and both seem to hang in the same system call: sched_getaffinity(3528, 32, <unfinished ...> Full output of strace attached for both cases. Hopefully this will make the bug immediately obvious to someone :-) [running on a laptop with the 7700 Intel cpu] -- Fernando
Attachment:
firefox.trace.gz
Description: GNU Zip compressed data
Attachment:
tomboy.trace.gz
Description: GNU Zip compressed data
- Follow-Ups:
- Re: v2.6.21.5-rt19 (sched_getaffinity?)
- From: Ingo Molnar <[email protected]>
- Re: v2.6.21.5-rt19 (sched_getaffinity?)
- References:
- v2.6.21.5-rt19
- From: Thomas Gleixner <[email protected]>
- Re: v2.6.21.5-rt19
- From: "Rui Nuno Capela" <[email protected]>
- Re: v2.6.21.5-rt19
- From: Fernando Lopez-Lezcano <[email protected]>
- Re: v2.6.21.5-rt19
- From: Ingo Molnar <[email protected]>
- Re: v2.6.21.5-rt19
- From: Fernando Lopez-Lezcano <[email protected]>
- Re: v2.6.21.5-rt19
- From: Gabriel C <[email protected]>
- Re: v2.6.21.5-rt19
- From: Fernando Pablo Lopez-Lezcano <[email protected]>
- v2.6.21.5-rt19
- Prev by Date: Re: partially mounted cifs filesystem
- Next by Date: Re: Documentation of kernel messages (Summary)
- Previous by thread: Re: v2.6.21.5-rt19
- Next by thread: Re: v2.6.21.5-rt19 (sched_getaffinity?)
- Index(es):