[RFC/SERIOUS] grilling troubled CPUs for fun and profit?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

while looking for loop places to apply cpu_relax() to, I found the
following gems:

arch/i386/kernel/crash.c/crash_nmi_callback():

        /* Assume hlt works */
        halt();
        for(;;);

        return 1;
}

arch/i386/kernel/doublefault.c/doublefault_fn():

        for (;;) /* nothing */;
}

Let's assume that we have a less than moderate fan failure that causes
the CPU to heat up beyond the critical limit...
That might result in - you guessed it - crashes or doublefaults.
In which case we enter the corresponding handler and do... what?
Exactly, we accelerate the CPUs happy march into bit heaven by letting it
execute a busy-loop under a non-working fan.
Thanks, your users will be very happy, I think ;)
(especially since it was "just" a simple fan failure that could have been
entirely remedied by buying another fan for $3)


The same thing applies to
arch/i386/kernel/smp.c/stop_this_cpu(), albeit there it's less catastrophic
due to most likely normal working conditions there.

IMHO on any critical CPU failure we should:
- try to log it (might be difficult with a broken CPU, though)
- optionally somehow directly alert the user
- STOP the system, COMPLETELY (that way people WILL take notice, hopefully
  before it's too late and actual damage will have occurred)
- make DAMN SURE that the (possibly already broken) CPU won't have a
  less than nice time once the system is stopped

Am I completely missing something here?

If this is an issue, then maybe we should consolidate those places into
one function that safely(!) halts a CPU, optionally disabling APIC etc.

Oh, and once you finished processing my mail here, you could optionally
also look at my report about almost unusably broken USB:
http://lkml.org/lkml/2006/6/19/54
(no replies yet despite advanced breakage)

Thanks!

Andreas Mohr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux