Fault tolerance/bad patch, [was Re: [PATCH 29/30] [PATCH] PCI Hotplug: fake NULL pointer dereferences in IBM Hot Plug Controller Driver]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Mon, Jun 19, 2006 at 02:43:35PM -0700, Greg KH wrote:
> From: Eric Sesterhenn <[email protected]>
> 
> Remove checks for value, since the hotplug core always provides
> a valid value.
> 
> -	if (hotplug_slot && value) {
> +	if (hotplug_slot) {

This may be the wrong place to bring up a philosphical issue,
but this example was just too good to pass up.  This patch violates
the general dictates of high-reliability/fault-tolerant programming.

If someone in the future changes the hotplug core so that it 
sometimes returns a null value, this code will potentially crash
and/or do other bad things (corrupt, invalid state, etc.)
This means that this routine will no longer be "robust" in the face of
changes in other parts of the kernel. 

I can hear the objections:
-- Performance. B.S. This routine is not performance critical, it will
   get called once a week, once a month or less often; a few extra
   cycles are utterly irrelevant.

-- Many eyes/shallow bugs. A patch that breaks things would be rejected.
   B.S. Patches that break things get into the kernel all the time.

-- If its broken, testing will find it and fix it. B.S.  The hotplug 
   routines are lightly tested, infrequently tested.  There's not much 
   hardware that does hotplug, and its expensive. Home enthsiasts 
   won't find this.

   Worse, the null-value condition could be a rare corner case that won't 
   show up during "normal" operation, i.e. during "normal" hotplug testing.
   It may only get triggered in some obscuare case e.g. bad pci card
   or sysadmin error, or due to error in a user-land tool.

Hotplug ops are rare; they typically are not done until they are needed 
(e.g. because the card failed), and that is a really bad time to discover 
that you've crashed the kernel.  The whole point of hotplug is to *not* 
have to reboot. 

(If I may blather and pompously pontificate some more: This is also 
why Linux could never be used in life-critical/safety-critical apps, 
like health care, machine control, automoitive, aviation, satellites,
or the Mars rover.  Code that controls life-critical apps has zillions 
of safety checks for situations that "can't possibly happen". But
every insider knows "can't happen" does happen: take the Hyabusa 
asteroid mission as a recent example.  My goal in pontificating is not 
to reject this one patch, but to change the cowboy attitude that makes 
people think that patches like this are OK.)

Thus, I can't begin to imagine why anyone would want to remove
robustness with a patch like this, and gain absolutely nothing in
return.


--linas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux