Fedora Users — Re: Back Again - RPM using a more efficient database format

Sam Varshavchik:
>>> With a large repository, like Fedora, even a compressed XML file is
>>> going to end up being rather huge. Then, you have to uncompress it and
>>> parse it. And, XML parsing is also not exactly a light task. 
 

Tim:
>> I thought there was supposed to be a move towards using a proper
>> database scheme, a long time ago, that would have sped up all of that?


Jim Cornette:
> I recall something discussed awhile back regarding implementing a more 
> efficient database scheme. Current discussions on development seem to be 
> concentrating on removing features like rpm -ivh url and instead 
> requiring the use of wget to download the rpm and secondarily installing 
> the rpm from local directory. No mention regarding changing to the more 
> efficient database format.

I was under the impression that part of the reason for using something
SQL based (see listing, below) was to do with it being faster to parse
than the rather free-for-all structure of an XML file.  Supposedly being
able to use a pre-existing databasing technique, rather than a custom
job on this special XML?

[root@bigblack ~]# ll /var/cache/yum/updates/
total 38356
-rw-r--r-- 1 root root        0 2007-08-01 12:34 cachecookie
-rw-r--r-- 1 root root 13102080 2007-07-27 23:43 filelists.sqlite
drwxr-xr-x 2 root root     4096 2007-06-22 16:23 headers
-rw-r--r-- 1 root root 21680128 2007-07-27 23:51 other.sqlite
drwxr-xr-x 2 root root    20480 2007-08-01 12:42 packages
-rw-r--r-- 1 root root  4373504 2007-08-01 12:34 primary.sqlite
-rw-r--r-- 1 root root     1953 2007-08-01 12:34 repomd.xml

The sqlite file didn't exist in the (much) older releases.  And I seem
to recall there was a tgz in there, though maybe the archive is now
discarded once unpacked?

I think whatever method you use for working out what packages are
available is going to be a fair bit of data to transfer, though.  Unless
there's going to be some way of simply getting "new information since
{date}" from the server (date being supplied by your software, as the
last date it checked).

Heck, it could use an NNTP server for that.  ;-)  There's a thought, I
wonder how practical it would be to have a two or three news groups on a
dedicated YUM NNTP network as the computer's way of working out what was
available as an update.  You have a history, ability to get things since
a certain date, or an ID number, and servers can easily expunge old
stuff so a complete fetch doesn't include the last ten versions of the
same thing, with NNTP.

fc7.updates.headers for the smallest information the system needed to
know about each package to work out what to do.  Your updater would drag
in the new stuff, and put what it needed into its own local database.
Each run of something like yum update would only drag in a few messages,
rather than a new 2 meg package list for each change to the list.

fc7.updates.human for people to read details per package, if they
wanted, like we have the packages announce mailing list.  The other
update list would provide the message Id for your client to fetch
particular info, if you wanted (e.g. you saw updates available for
"this", "that", and "the other", and you decided you wanted to know what
"that" was before proceeding, on an interactive update).

fc7.updates.something.else if there needed to be more information...

-- 
[tim@bigblack ~]$ uname -ipr
2.6.22.1-33.fc7 i686 i386

Using FC 4, 5, 6 & 7, plus CentOS 5.  Today, it's FC7.

Don't send private replies to my address, the mailbox is ignored.
I read messages from the public lists.