[linux-elitists] Stable distro kernel rant

Greg KH greg@kroah.com
Sat Jan 24 11:37:01 PST 2004


On Fri, Jan 23, 2004 at 11:40:41PM -0800, Rick Moen wrote:
> Quoting Greg KH (greg@kroah.com):
> 
> > > Still scared?
> > 
> > Very much so.  Nothing like backporting fixes to a kernel base that is
> > almost 2 years old (looks like 2.4.18 came out in Feb 2002) to cause
> > major headaches.
> 
> And yet, oddly, they keep doing that in the apparent belief that it's
> the easiest and most effective way to maintain systems constructed on it.
> I suppose they could be dead wrong; they've only been at this eight
> years or so.

Well, I do think they are wrong.  More on that below.

> How long have you maintained a major Linux distribution, by the way
> (just for comparison's sake)?

Hm, let's see.  I maintained a relatively popular linux distro, Immunix,
for about 2 years.  I was in charge of their kernel package, and the
security updates for pretty much the whole distro.  I dealt with
customers and backported my fair share of drivers and fixes to a "old"
kernel version.

I then moved on to my current employer.  I've been here about 3 years
now.  Every day I deal with the kernel, old backports, customers, and
bugs.  Not to mention I receive a zillion different bug reports daily
from real users, the large majority of them are using old kernels.

Almost all problems I've seen can be fixed by just telling the user to
upgrade to the latest kernel.  Bugs are constantly fixed in newer
kernels, that's why they are released.  Bugs are fixed that usually only
1 person knows about at the time (or less, as the developer sees the
chance of a bug.)  People using "stable" kernels from distros never get
those bug fixes :(

Now as to why distros insist on remaining on a "old stable" kernel?  I
think they do so for a number of different reasons, all bad for the
user:
	- not enough man-power to fully test new kernel updates.
	- feeling of goodness by working with a known kernel.
	- don't want to disrupt kernel interface apis so that third
	  party modules still work properly.

For example, take a look at that debian kernel.  It's based off of
2.4.18, with I'm guessing, all of the relevant security updates.  That's
great, but the state of the drivers for that kernel remains at what was
available 2 years ago.  Which is fine if you are running 2 year old
hardware, and will never buy a new network card, scsi controller, usb
device, etc.  For users like that, great stick with that kernel.  For
the rest of the world, you will have to move on and use a newer kernel.

Remember what started this thread?  Someone complained that their kernel
did not support their latest devices.  These devices weren't even
available when the 2.4.18 kernel was released.  So how that person
expected their 2 year old kernel to support their month old device is
beyond me...


Ok, another rant about old kernels while I'm at it...

A number of distros today are coming out with "enterprise" versions.  In
these they tell the customer that they will stick with a specific
kernel/glibc/gcc/everything else for X number of years.  Managers like
the sound of that, as it looks like "stable == good".  But here's what
happens in reality:
	- enterprise kernel is released.  Works great for all hardware
	  shipping at this point in time.
	- time passes...
	- customer buys new machine, and wants to install the distro on
	  it.  Finds out that there is no support for their built in
	  network/scsi/whiz_bang_bus_of_the_moment device.  So they
	  complain to the hardware vendor and distro.

At this point time the proper thing to do would be to just upgrade the
kernel to the latest one, which properly supports that new device.  But
instead the engineers are required to backport support to that old
kernel.  Now for 1 or 2 devices this isn't a problem...  But that does
not scale.

	- customer reports a bug in the enterprise kernel.
	- engineer realizes this bug is already fixed in the newer
	  kernel and backports the fix.
Now those two steps repeat a lot.  Combined with the influx of new
driver/device support the enterprise kernel starts to look like a
monstrosity of patches apon patches apon a sagging, ageing base.  This
turns into a nightmare of maintenance quit quickly.  People who are
supposed to get new hardware working on kernels like this soon sink into
large fits of despair (I can walk up behind some people at work and
whisper 2.4.9 and watch them physically flinch, it's that bad.)

Another real life situation, names XXXed out to protect the guilty.  I was told, 
	"Customer YYY is insisting on using distro XXX and their kernel
	causes a bug when running on our new box ZZZ.  The support
	people found out that if they replace the individual drivers,
	the bug still seems to happen at times, but if they replace the
	entire scsi and usb code in the distro kernel with the latest
	stuff from the stable kernel tree at kernel.org the bug goes
	away.  Please fix this problem as it is a show stopper."

Think about that for a second as to the lunacy of it all...

(Big props go out to the person who did solve the above problem, he did
a wonderful job in figuring out the issue as it was quite complex and
large.)

Now yeah, I know that some big vendors only validate their product on
specific distros with specific kernels (oracle, db2, etc.) and they are
the people driving distros to try to maintain these horrible kernels for
people.  But if they took the time to keep up to date too, it wouldn't
be a problem.


So, what to do?

I think that if you have new hardware, you have to use the latest
kernel.  Trying to get even a 6 month old kernel running on a brand new
box is sometimes impossible.  If you have old hardware, then sure, older
kernels usually work just fine.


Sorry this rant got a bit long, it's just that it constantly bugs me, as
I deal with it every day (and have for the past 5 years).  So yes, I
think I am qualified to say this :)

Oh, and then there's the project manager people who come to me and ask
why I can not get support for feature X into the currently shipping
distro Y release for hardware that has never seen the light of day
outside of a lab (nevermind the fact that _I_ haven't been able to get
specs or hardware to test that feature...)  But that's worthy of a rant
all on its own...

greg k-h

(oh yeah, all opinions related above are my own, and do not reflect the
opinions of my employer, etc, etc, etc...)



More information about the linux-elitists mailing list