Re: Back Again

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Todd Zullinger writes:

Thank you for the detailed reply Sam. :)

Excuse me for asking more questions (potentially dumb questions at
that).

Sam Varshavchik wrote:
I don't know if you've ever upgraded Fedora from one release to the
next.  The upgrade process is as slow as molasses, even though all
the metadata is right there.

No, I avoid upgrades.  I always do fresh installs as a matter of
habit.  Point taken though.  I have read a lot of complaints of slow
upgrades at the dependency resolution stage.

A few years ago the base distro was much smaller than it it now. The
size of a typical Linux distro has really balooned. Some of the
algorithms in rpm scale horribly. It wasn't such a big deal when a
typical linux distro was only a few hundred packages, but now it's a
few thousand packages, with dependencies that are much more
complicated, and rpm is now really blowing apart at the seams.

I haven't looked at the code, but is it rpm or yum that's really
bogging down?  Or aren't you making much of a distinction when you say
rpm?

I'd break it down as about 70% yum vs. 30% rpm. Yum is really taking its sweet time figuring out what it needs to do. But even after it's done that, and downloaded everything, rpm still tends to spin its wheels, if it has a large list of packages to chew through.

Furthermore, rpm, as is, does not implement remote repositories.

Does it need to?  Does dpkg do this?

With a large repository, like Fedora, even a compressed XML file is
going to end up being rather huge. Then, you have to uncompress it
and parse it.  And, XML parsing is also not exactly a light task.

But somehow or another you need to deal with a sizable chunk of data
to make reasonable decisions regarding dependencies.  The tough part
about rpm development is trying to be backward compatible and still
make forward progress.  I don't envy the guys hacking on rpm.

You do /not/ need that much info in the first step. All you need is a just a list of names of packages available on the remote repository. You reconcile that against the list of packages you already have downloaded the metadata for, and you then know what's new.

Meanwhile, primary.xml.gz is actually a voluminous XML file that contains not just each package's name and version, but also all sorts of extra info. And you have to download the whole thing every time. And, the current version of yum, sqlite-based, does not help. I see that primary.sqlite.bz2 is about twice as large as primary.xml.gz.

So, all this talk of a database-based yum, and it turns out that you end up having to download /twice/ as much data as you used to before? Someone explain to me what we're supposed to be doing here.

Let's look at repodata. Right now, for fedora updates, 7/i386/repodata, we have this:

total 16904
drwxr-xr-x 2 root root    4096 2007-07-30 12:23 .
drwxr-xr-x 3 root root  159744 2007-07-30 12:23 ..
-rw-r--r-- 1 root root 2676161 2007-07-30 12:23 filelists.sqlite.bz2
-rw-r--r-- 1 root root 2703076 2007-07-30 12:22 filelists.xml.gz
-rw-r--r-- 1 root root 4603154 2007-07-30 12:23 other.sqlite.bz2
-rw-r--r-- 1 root root 5249048 2007-07-30 12:22 other.xml.gz
-rw-r--r-- 1 root root 1122990 2007-07-30 12:23 primary.sqlite.bz2
-rw-r--r-- 1 root root  732021 2007-07-30 12:22 primary.xml.gz
-rw-r--r-- 1 root root    1953 2007-07-30 12:23 repomd.xml

From what I see yum is doing, it download the primary, the other file, and
possibly filelists, /every/ time a single package gets added to the repository. Even though 99% of the content is the same as before.

This, in my opinion, does not really such an optimum design to me. You should /not/ have to download /everything/ every time a single package changes.

Remote package repositories could've been implemented much better.
When I had some free time some time ago, I quickly hacked up a
package manager for some of my internally-developed software. I
found that I could do similar kind of package metadata
synchronization much more efficiently than yum/rpm.

Isn't the harder part doing this in a way that doesn't completely
break backward compatibility though?  And then you have to spend a
bunch of years adding new code to deal with the odd sorts of deps that
packagers come up with in the wild (versioned obsoletes on a multilib
system sounds fun :).

Someone posted to the fedora-devel list a month or so ago saying
they'd created a super fast depsolver using php and mysql.  Once all
of the various cases they'd missed were explained, things didn't go
much further.  (And no, I'm not at all suggesting that applies to your
work -- it's obvious that you know more than that and that you
actually created a working system. :)

In my case, I had no intention of bending over backwards in order to stay compatible with rpm. The whole point was do this better, have a clean start, and a clean design, and then provide later a shim layer that imports rpm's dependencies. And my design has far more sophisticated dependency design than rpm. All the extra hackery that's done now with kernel packages, which support third party out-of-tree kernel modules using a yum plug in -- all of that is broomed away and the additional logic becomes incorporated in the overall design, rather than an aftermarket add-on hack. Ditto for the epoch hack -- my solution fixes the original underlying reason for having an epoch in the first place.

And, of course, php+mysql will always a lot of overhead. No matter what you do there, you will always be left in the dust by carefully-designed, compiled C++ code. No matter how you twist in turn, you'll always have to: compile php code, interpret php code, generate SQL, send the SQL over a communication channel to the mysql db engine, have your SQL parsed, query plan formed, then finally processed by the mysql engine, and finally returning the resulting data. The C++ equivalent: run already-compiled code. Done.

metadata file you want to download, you can use HTTP 1.1 partial
chunk request feature to download just the bits of the metadata file
that you want.

Perhaps you should bribe someone to implement this in yum as a proof
of concept?

Well, I can point them to how HTTP 1.1 chunking works, and how to gracefully autodetect if the HTTP server supports HTTP 1.1 chunking, and the logic to gracefully fall back to "Plan B", if the repository's HTTP server is running old Apache without HTTP 1.1 support, and what to do next. That's about all I can do. I won't write the code, I have plenty of other coding work that keeps me busy.

But then, after all is said and done, no amount of tweaks to rpm can
compensate for stupid and broken packaging. Right now, due to
indirect dependencies, grub requires *GTK* runtime libraries to be
installed. On my headless machine, I now have to plop down a
crapload of x.org and GTK RPMs, because grub requires them, due to
its intermediate dependencies.

Yeah.  This was caused by policy more than by incompetence.  The folks
at Red Hat's legal department asked that all of the trademarked logos
be kept in one package, for easier tracking and removal by downstream
users of Fedora's packages (or something like that).

It's not that trademarked logos must be kept in one package. It's just that the package, for some reason that I still can't fathom, must depend on gtk2 code libraries. Why would a package that supposedly contain nothing more than a bunch of logo image files, have a needed dependency on a package that contains system libraries? That just does not compute.

I haven't really looked at it, but the probable story is that gtk2-engine or the gnome-themes package also includes some shell script that the logos package needs for some reason, so rather than separating it out into a subpackage, which would be the proper thing to do, you have to install the whole bloody thing, and because gtk2 requires all xorg core libraries, that ends up getting sucked down the drain as well.

Although this does not have any direct relevance to the overall issue of rpm's design, it is demonstrative, though, of the same kind of inefficient non-attention to detals.

Attachment: pgpzFrtLJRU3E.pgp
Description: PGP signature


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux