Re: [bug] SLOB crash, 2.6.24-rc2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ingo Molnar wrote:
> * Nick Piggin <[email protected]> wrote:
>
>   
>> On Thursday 15 November 2007 21:43, Ingo Molnar wrote:
>>     
>>> * David Miller <[email protected]> wrote:
>>>       
>>>> From: Matt Mackall <[email protected]>
>>>> Date: Wed, 14 Nov 2007 17:37:13 -0600
>>>>
>>>>         
>>>>> No, the usual strategy for debugging problems -outside- SLOB is to
>>>>> switch to another allocator with more extensive debugging facilities.
>>>>>           
>>>> Ok, so the thing we still can do is do a dump_stack() at the list
>>>> debugging assertion trigger points.
>>>>         
>>> ok, i'll first try to trigger it again.
>>>       
>> I had implemented SLOB in userspace, so I resynched and think I found 
>> your problem. Sorry for the attachment format -- this mailer isn't the 
>> best. I'm really computer illiterate when it comes to userspace...
>>     
>
> thx, i'll try your fix in a minute.
>
>   
>> Anyway, I'm really happy to see you're testing and using SLOB upstream
>> :) Is there any particular reason that you're using it?
>>     
>
> i sometimes test SLOB for -rt, but this time it's the result of my 
> "automated random QA" effort, as part of arch/x86 maintainance/QA.
>
> the main trick is to build and booting random "make randconfig" 
> bzImages. That finds build bugs and a good deal of boot hang and crash 
> bugs as well. (it also found a compiler bug already) I can build and 
> boot about 1000 random kernels in 24 hours, and it's all fully 
> automated. I usually run it overnight - when a kernel does not come up 
> due to a bootup hang or crash (or the kernel log signals any exception 
> condition) then the script stops and i can fix it in the morning.
>
> The first step towards this was to get allyesconfig bzImage kernels to 
> build and boot fine. That effort took months (we had many problems in 
> this area) - i think you saw bugreports and fixes from me about that on 
> lkml.
>
> Once that worked reasonably well i made a small Kconfig patch that 
> forcibly selects a "minimum set" of drivers and kernel subsystems that 
> are needed to boot up a testsystem. Once a "make allnoconfig" and a 
> "make allyesconfig" bzImage kernel boots up fine on the testbox all 
> randconfig configs "inbetween" are supposed to build and boot fine as 
> well.
>
> I also have a patch that adds all the x86 boot options like nosmp, 
> maxcpus=1, nohz=off, hpet=disable to be selectable as .config options - 
> so those boot options are randomized as well.
>
> I also have a small patch that disables half a dozen drivers/features 
> that are not expected to work out of box in a bzImage kernel. (such as 
> ISA drivers that assume the presence of hardware, or root filesystem 
> features such as NFSROOT)
>
> the resulting make randconfig kernel still has 99% of the degrees of 
> freedom that a stock make randconfig kernel has, so by all practical 
> purposes it's a fully random kernel - it just happens to boot on my 
> testsystem all the time.
>
> A successful bootup means the test system is able to boot up into a 
> stock Fedora 8 userspace and is able to bring up its network interfaces 
> and ssh out (automatically) to the build box to signal the completion of 
> a successful test cycle. The logs are also analyzed for lockdep 
> assertions (if lockdep is enabled - which it is in about 20% of the 
> randconfig kernels) and other kernel bugs.
>
> (just in case you were wondering about one of the reasons why the 
> arch/x86 unification merge went so smoothly, with nary a regression ;-) 
> Thomas is doing other types of automated QA of the x86 queue as well.)
>
> this method found the SG-list corruption bugs the following night after 
> Linus committed Jen's SG-list changes, so it's pretty good at finding 
> regressions as early as possible.
>
> 	Ingo
>   

  How complete is the QA testing?  I was reading this interesting thread
and it occurred to me that this sounds like a useful distributed
computing application.  ie a central server with all valid Kconfig
combinations (how many are there?) for a particular release (-rc or
otherwise) across all architectures.  These are allocated to clients on
request to be built / booted etc.  Any errors are fed back to the
central server.  I guess this would be a useful resource for
developers.  More importantly (and I don't know if this is the case
already!) a new Linux release (2.6.x) could be "certified" with some
level of testing on known hardware / architectures.

  tbh, I feel sorry for Ingo's machine compiling 1000 random kernels in
24h!  I'm surprised it hasn't called the Samaritans...

Dave.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux