Damian Menscher wrote:
We were taking advantage of the weekend to update our RHEL4.1 machine to RHEL4.2. It is worth mentioning that it is an x86_64 box. While running up2date, it hung on one of the packages. There was a useradd process sucking 99% of the CPU that couldn't be killed or traced. (It ignored sigKILL and sigSTOP, was not a zombie or in iowait, and [sl]trace just hung with no output (and required a kill -9) on it. Killing its parent just got it adopted by init, with no change in behavior. Bug report against shadow-utils has been filed [1] though I believe that an unkillable process that uses CPU is an indication of a problem in the kernel. Anyway, at least a reboot fixed that. But the damage was much worse: our RPM database appears to be mangled quite badly, and the standard trick of rm /var/lib/rpm/__db.00[1-3] && rpm --rebuilddb doesn't help. In particular, the RPM database now thinks that we have *both* the old and new versions of several RPMs installed. For example: # rpm -q --qf "%{NAME}-%{VERSION}-%{RELEASE} %{ARCH}\n" rpm rpm-4.3.3-9_nonptl x86_64 rpm-4.3.3-11_nonptl x86_64 We have 164 packages which are duplicated in this fashion. Meanwhile up2date wants to update them (since we have the old version) but can't (because we have the new version). So... I'm looking for advice on how to handle this. My guess is that the cleanest thing to do is remove the newer of each pair from the database and then run up2date, which should upgrade everything (fixing any old files on disk). My proposed script for doing this is: for file in `rpm -qa --queryformat="%{NAME} %{ARCH}\n" | sort | uniq -c | grep -v " 1 " | cut -c 9- | cut -d" " -f1`; do rpm -q --last $file | head -1 | cut -d" " -f1; done | xargs rpm -e --justdb up2date -p up2date Does this look reasonable to everyone, or is there a better way to handle this problem? Many thanks in advance for any advice,
If up2date was prevented from completing its task, duplicate rpms will show in the database as you are seeing. Usually, the rpms are still in the /var/spool/up2date directory. Since the rpms should be still on the computer, running "rpm -Fvh *.rpm --replacefiles --replacepkgs" might clean up the database errors. Alternatively, you could come up with a script to remove just the db entry for the older rpm versions. The option for removing just the database and not any files is --justdb.
From my experience with problems encountered by power outages and times shutting down the computer before up2date finished its job, the old package entries was all that was actally left on the system. The newer rpm replaced all the files that the older rpms put on the system. Removing the older rpm has a likelyhood of removing the file that the newer rpm installed on your system. BTW- This advice was originally passed on by a Red Hat employee on one of the many lists. My system seems to be intact after ionly removing the database entries from the older packages. Thanks goes to the original source of information.
Jim
Damian Menscher [1] https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=170087
-- QOTD: If it's too loud, you're too old.