Fedora Users — Re: Back Again

Todd Zullinger writes:

Thank you for the detailed reply Sam. :)

Excuse me for asking more questions (potentially dumb questions at
that).

Sam Varshavchik wrote:

I don't know if you've ever upgraded Fedora from one release to the
next.  The upgrade process is as slow as molasses, even though all
the metadata is right there.


No, I avoid upgrades.  I always do fresh installs as a matter of
habit.  Point taken though.  I have read a lot of complaints of slow
upgrades at the dependency resolution stage.

A few years ago the base distro was much smaller than it it now. The
size of a typical Linux distro has really balooned. Some of the
algorithms in rpm scale horribly. It wasn't such a big deal when a
typical linux distro was only a few hundred packages, but now it's a
few thousand packages, with dependencies that are much more
complicated, and rpm is now really blowing apart at the seams.


I haven't looked at the code, but is it rpm or yum that's really
bogging down?  Or aren't you making much of a distinction when you say
rpm?

I'd break it down as about 70% yum vs. 30% rpm. Yum is really taking itssweet time figuring out what it needs to do. But even after it's done that,and downloaded everything, rpm still tends to spin its wheels, if it has alarge list of packages to chew through.

Furthermore, rpm, as is, does not implement remote repositories.


Does it need to?  Does dpkg do this?

With a large repository, like Fedora, even a compressed XML file is
going to end up being rather huge. Then, you have to uncompress it
and parse it.  And, XML parsing is also not exactly a light task.


But somehow or another you need to deal with a sizable chunk of data
to make reasonable decisions regarding dependencies.  The tough part
about rpm development is trying to be backward compatible and still
make forward progress.  I don't envy the guys hacking on rpm.

You do /not/ need that much info in the first step. All you need is a just alist of names of packages available on the remote repository. You reconcilethat against the list of packages you already have downloaded the metadatafor, and you then know what's new.

Meanwhile, primary.xml.gz is actually a voluminous XML file that containsnot just each package's name and version, but also all sorts of extra info.And you have to download the whole thing every time. And, the currentversion of yum, sqlite-based, does not help. I see that primary.sqlite.bz2is about twice as large as primary.xml.gz.

So, all this talk of a database-based yum, and it turns out that you end uphaving to download /twice/ as much data as you used to before? Someoneexplain to me what we're supposed to be doing here.

Let's look at repodata. Right now, for fedora updates, 7/i386/repodata, wehave this:


total 16904
drwxr-xr-x 2 root root    4096 2007-07-30 12:23 .
drwxr-xr-x 3 root root  159744 2007-07-30 12:23 ..
-rw-r--r-- 1 root root 2676161 2007-07-30 12:23 filelists.sqlite.bz2
-rw-r--r-- 1 root root 2703076 2007-07-30 12:22 filelists.xml.gz
-rw-r--r-- 1 root root 4603154 2007-07-30 12:23 other.sqlite.bz2
-rw-r--r-- 1 root root 5249048 2007-07-30 12:22 other.xml.gz
-rw-r--r-- 1 root root 1122990 2007-07-30 12:23 primary.sqlite.bz2
-rw-r--r-- 1 root root  732021 2007-07-30 12:22 primary.xml.gz
-rw-r--r-- 1 root root    1953 2007-07-30 12:23 repomd.xml

From what I see yum is doing, it download the primary, the other file, and

possibly filelists, /every/ time a single package gets added to therepository. Even though 99% of the content is the same as before.

This, in my opinion, does not really such an optimum design to me. Youshould /not/ have to download /everything/ every time a single packagechanges.

Remote package repositories could've been implemented much better.
When I had some free time some time ago, I quickly hacked up a
package manager for some of my internally-developed software. I
found that I could do similar kind of package metadata
synchronization much more efficiently than yum/rpm.


Isn't the harder part doing this in a way that doesn't completely
break backward compatibility though?  And then you have to spend a
bunch of years adding new code to deal with the odd sorts of deps that
packagers come up with in the wild (versioned obsoletes on a multilib
system sounds fun :).

Someone posted to the fedora-devel list a month or so ago saying
they'd created a super fast depsolver using php and mysql.  Once all
of the various cases they'd missed were explained, things didn't go
much further.  (And no, I'm not at all suggesting that applies to your
work -- it's obvious that you know more than that and that you
actually created a working system. :)

In my case, I had no intention of bending over backwards in order to staycompatible with rpm. The whole point was do this better, have a clean start,and a clean design, and then provide later a shim layer that imports rpm'sdependencies. And my design has far more sophisticated dependency designthan rpm. All the extra hackery that's done now with kernel packages, whichsupport third party out-of-tree kernel modules using a yum plug in -- all ofthat is broomed away and the additional logic becomes incorporated in theoverall design, rather than an aftermarket add-on hack. Ditto for the epochhack -- my solution fixes the original underlying reason for having anepoch in the first place.

And, of course, php+mysql will always a lot of overhead. No matter what youdo there, you will always be left in the dust by carefully-designed,compiled C++ code. No matter how you twist in turn, you'll always have to:compile php code, interpret php code, generate SQL, send the SQL over acommunication channel to the mysql db engine, have your SQL parsed, queryplan formed, then finally processed by the mysql engine, and finallyreturning the resulting data. The C++ equivalent: run already-compiled code.Done.

metadata file you want to download, you can use HTTP 1.1 partial
chunk request feature to download just the bits of the metadata file
that you want.


Perhaps you should bribe someone to implement this in yum as a proof
of concept?

Well, I can point them to how HTTP 1.1 chunking works, and how to gracefullyautodetect if the HTTP server supports HTTP 1.1 chunking, and the logic togracefully fall back to "Plan B", if the repository's HTTP server is runningold Apache without HTTP 1.1 support, and what to do next. That's about all Ican do. I won't write the code, I have plenty of other coding work thatkeeps me busy.

But then, after all is said and done, no amount of tweaks to rpm can
compensate for stupid and broken packaging. Right now, due to
indirect dependencies, grub requires *GTK* runtime libraries to be
installed. On my headless machine, I now have to plop down a
crapload of x.org and GTK RPMs, because grub requires them, due to
its intermediate dependencies.


Yeah.  This was caused by policy more than by incompetence.  The folks
at Red Hat's legal department asked that all of the trademarked logos
be kept in one package, for easier tracking and removal by downstream
users of Fedora's packages (or something like that).

It's not that trademarked logos must be kept in one package. It's just thatthe package, for some reason that I still can't fathom, must depend on gtk2code libraries. Why would a package that supposedly contain nothing morethan a bunch of logo image files, have a needed dependency on a package thatcontains system libraries? That just does not compute.

I haven't really looked at it, but the probable story is that gtk2-engine orthe gnome-themes package also includes some shell script that the logospackage needs for some reason, so rather than separating it out into asubpackage, which would be the proper thing to do, you have to install thewhole bloody thing, and because gtk2 requires all xorg core libraries, thatends up getting sucked down the drain as well.

Although this does not have any direct relevance to the overall issue ofrpm's design, it is demonstrative, though, of the same kind of inefficientnon-attention to detals.

Attachment: pgpzFrtLJRU3E.pgp
Description: PGP signature