Re: 2.6.16-rc4: known regressions

On Wed, 22 Feb 2006, David Zeuthen wrote:
> 
> Oh, you know, I don't think that's exactly how it works; HAL is pretty
> much at the mercy of what changes goes into the kernel. And, trust me,
> the changes we need to cope with from your so-called stable API are not
> so nice. 

Why do you "cope"?

Start complaining. If kernel changes screw up something, COMPLAIN. Loudly. 
They shouldn't.

> It also makes me release note that newer HAL releases require newer
> kernel and udev releases and that's alright.

It's _somewhat_ ok to have a well-defined one-way dependency. It's sad, 
but inevitable sometimes.

For example, the kernel does have a dependency on the compiler used to 
compile it. We try to avoid it as far as possible, but we've slowly been 
updating it, first from 1.40 to 2.75 to 2.9x and now to 3.1. But the 
kernel obviously shouldn't have any other run-time dependencies, because 
everything else is "on top of" the kernel.

What is NOT ok is to have a two-way dependency. If user-space HAL code 
depends on a new kernel, that's ok, although I suspect users would hope 
that it wouldn't be "kernel of the week", but more a "kernel of the last 
few months" thing. 

But if you have a TWO-WAY dependency, you're screwed. That means that you 
have to upgrade in lock-step, and that just IS NOT ACCEPTABLE. It's 
horrible for the user, but even more importantly, it's horrible for 
developers, because it means that you can't say "a bug happened" and do 
things like try to narrow it down with bisection or similar.

> For just one example of API breaking see
> 
>  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=175998

So the kernel obviously shouldn't be just randomly changing the type 
numbers around. 

The _real_ bug seems to be that some people think it is OK to do this kind 
of user-visible changes, without even considering the downstream, or 
indeed, without even telling anybody else (like Andrew or me) about it.

> This breaks stuff for end users in a stable distribution. Not good.

Indeed. Not good at all.

And yes, some of it may be just HAL being a fragile mess, and some of it 
may end up being just user-level code that must be made to be more robust 
("I see a new type I don't understand" "Ok, assume a lowest common 
denominator, and stop whining about it"). 

But a lot of it is definitely some kernel people being _waayy_ too 
cavalier about userspace-visible changes.

> I think maintaining a stable syscall interface makes sense. Didn't you
> once say that only the syscall interface was supposed to be stable? Or
> was that Greg KH? I can't remember...

It's _not_ just system calls. It's any user-visible stuff. That very much 
includes /proc, /sys, and any "kernel pipes" aka netlink etc bytestreams.

What is not stable is the _internal_ data structures. We break external 
modules, and we sometimes break even in-kernel drivers etc with abandon, 
if that is what it takes to fix something or make it prettier.

So fcntl and ioctl numbers etc are _inviolate_, because they are part of 
the system interface. As is /proc and /sys. We don't change them just 
because it's "convenient" to change them in the kernel. 

If /sys needs an extended type to describe the command set of a device, we 
do NOT just change an existing attribute in /sys. 

> And I also think that breaking things like sysfs can be alright as long
> as you coordinate it with major users of it, e.g. udev and HAL.

The major users are USERS. Not developers. It doesn't help to "coordinate" 
things, when what gets screwed is the end-user who no longer can upgrade 
his kernel without worrying that something might break.

THIS IS WHY WE MUST MAKE THE KERNEL INTERFACES STABLE!

If users cannot upgrade their kernels safely, we will have two totally
unacceptable end results:

 - users won't upgrade. They don't dare to, because it's too painful, and 
   they don't understand HAL or hotplug, or whatever. 

   If a developer cannot see that this is unacceptable, then that 
   developer is a nincompoop and needs to be educated.

 - users upgrade, and generate bug reports and waste other developers time 
   because those other developers didn't realize that the HAL cabal had 
   decided that that breakage was "ok".

   Or worse, they don't generate the bug reports, and then six months from 
   now, when they test again, and it's still broken, they generate a 
   really bad one ("it doesn't work") when everybody - including the HAL 
   cabal - has forgotten what it was all about.

   Again, if a developer cannot see that this is unacceptable, then that 
   developer is not playing along, and needs to have his mental compass 
   re-oriented.

The fact is, regressions are about 10x more costly than fixing old bugs. 
They cause problems downstream that just waste everybodys time. It's a 
_hell_ of a lot more efficient to spend extra time to keep old interfaces 
stable than it is to cause regressions.

> One day perhaps sysfs will be "just right" and you can mark it as being
> stable. I just don't think we're there yet. And I see no reason
> whatsoever to paint things as black and white as you do.

Nothing will _ever_ be "just right", and this has been going on too long. 
We had better get our act together.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: 2.6.16-rc4: known regressions
  - From: David Zeuthen <david@fubar.dk>
- Re: 2.6.16-rc4: known regressions
  - From: Al Viro <viro@ftp.linux.org.uk>
- Re: 2.6.16-rc4: known regressions
  - From: Linus Torvalds <torvalds@osdl.org>

References:
- Linux 2.6.16-rc4
  - From: Linus Torvalds <torvalds@osdl.org>
- 2.6.16-rc4: known regressions
  - From: Adrian Bunk <bunk@stusta.de>
- Re: 2.6.16-rc4: known regressions
  - From: "Pekka Enberg" <penberg@cs.helsinki.fi>
- Re: 2.6.16-rc4: known regressions
  - From: Adrian Bunk <bunk@stusta.de>
- Re: 2.6.16-rc4: known regressions
  - From: Pekka Enberg <penberg@cs.helsinki.fi>
- Re: 2.6.16-rc4: known regressions
  - From: Greg KH <gregkh@suse.de>
- Re: 2.6.16-rc4: known regressions
  - From: Pekka Enberg <penberg@cs.helsinki.fi>
- Re: 2.6.16-rc4: known regressions
  - From: Kay Sievers <kay.sievers@suse.de>
- Re: 2.6.16-rc4: known regressions
  - From: Pekka J Enberg <penberg@cs.Helsinki.FI>
- Re: 2.6.16-rc4: known regressions
  - From: Kay Sievers <kay.sievers@suse.de>
- Re: 2.6.16-rc4: known regressions
  - From: Linus Torvalds <torvalds@osdl.org>
- Re: 2.6.16-rc4: known regressions
  - From: David Zeuthen <david@fubar.dk>

Prev by Date: Re: 2.6.16-rc4: known regressions
Next by Date: Re: [PATCH 2/3] sysfs representation of stacked devices (dm) (rev.2)
Previous by thread: Re: 2.6.16-rc4: known regressions
Next by thread: Re: 2.6.16-rc4: known regressions
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]