David Boles writes:
on 7/31/2007 7:03 PM, Sam Varshavchik wrote:Todd Zullinger writes:Sam Varshavchik wrote:Nah, it's not closer. It's just that rpm is getting crappier every year, and is long overdue for replacement.I could easily be mistaken, but AFAIK, the main difference in speed that end users notice between yum and apt is due to the fact that apt caches it's metadata. In between runs of apt-get update, calls to apt-get use the data on disk without hitting the network. With yum, the update and upgrade steps from apt-get are both done in the update.I don't know if you've ever upgraded Fedora from one release to the next. The upgrade process is as slow as molasses, even though all the metadata is right there.Do you know just why an upgrade of a system 6 months old, or more, takes longer than a fresh install of a new release? You should study that situation. Start with package dependencies and then think about just what you might have changed and added from third party sites. Then think some more.
Well, I did think. The system does not have anything beyond Fedora and Fedora Extras, plus my own RPMs. But why does it matter, anyway? Why does the presence of a foreign RPM cause such a nervous breakdown? At most it should result in an unsatisifed dependency. But why would should this result in rpm spinning its wheels, to such an extent?
Care for a really stupid example? Take a 2006 automobile. Examine it very closely. Then with a garage full of new 2007 parts make it a 2007 automobile. All the time making sure that everything fits and still works. Email us when you're finished.
No matter which parts you do have in your automobile and where they came from, when you have to compare its part with a fixed list of two thousand other parts, from a reference model, it should take the exact same amount of time whether all your parts are OEM or aftermarket. It's the same number of parts in your car, whether original or replacement, after all. So why would it matter?
At most, the complexity of what RPM has to do would be O(N), and it should really be O(log N). But it seems, though, that RPM's actual complexity is at least O(N^2), unscientifically.
I tell you this. I mentioned before that I use my own package management tool internally to manage some homebrewed software. I have a compatilibity shim that sucks out pretty much the entire contents of the system RPM database, and imports all of the dependencies into my internal package database. This is to allow my own packages, which might have, say, a dependency on something.so, have the dependencies satisified by an RPM.
Basically, I read all RPM resources, and create a dummy package that provides those resources, then install the dummy package, so my internal package database contains all the RPM-provided resources. Each time I update some RPMs, I rerun the import script and upgrade the old dummy RPM compatibility package to a new one.
This operation, you understand of course, is analogous to your example -- taking an old snapshot of the entire RPM database, comparing it to a new one, and reconciling any differences against resources required by my internal packages, to make sure that they don't break. This operation is also equivalent, to what Anaconda has to do when it's about to upgrade the Fedora distro -- take the current RPM database, and reconcile it with the RPM database from the release you're updating to.
It takes me, oh, maybe a minute or so to crunch everything together. The analogous step in Anaconda -- "Preparing transaction" -- takes aout 5-10 minutes.
And I actually have more work to do. RPM has, I believe, three resources classes to reconcile against each other -- provided resources, required resources, and conflicting resources. My internal package database has six resource classes to reconcile, so I actually have more work to do.
The performance degradation that I see in Anaconda is far more pronounced on less-robust hardware. On my less-than one year old laptop, with a fairly speedy Pentium, and 2 gigs of RAM, Anaconda is about 2-3 times slower than my homegrown code. On an old box that I have, running a pair of decade-old (approx) 500 Mhz Celerons, with 256MB RAM, rpm is dreadfully slow -- about 10-15 times slower than my homegrown code. There's something terribly inefficient in the way that Anaconda goes about its business. It should /not/ take that long to do its duty.
Some of it might be due to Anaconda being Python code, and my homegrown code being C++.
Attachment:
pgpt9pund1O55.pgp
Description: PGP signature