Fedora Users — Re: smartd error messages: OfflineUncorrectableSector, CurrentPendingSector

Just for followup on this --

On Wed, 22 Jun 2005 19:43:21 +0900
I <rees@xxxxxxxxxxx> wrote

> Okay, on the stuff below, I used lvdisplay -m to show the extents. I
> used bc to figure out that I have 65536 sectors in an extent, and I was
> then able to tentatively locate the bad sector at 2735304 sectors in on
> my /var/tmp volume by subtracting out the starting sector of the LVM
> physical partition. 
> 
> (There's enough room there, I suppose I could just use lvm to shrink the
> partition, and then forget about the bad sector and not use LVM next
> time.)
> 
> I tried using tune2fs -l /dev/hda3 to figure out the logical blocks, but
> it says "bad magic in superblock". debugfs was similarly cooperative. 

To me it was a sort of stab in the dark, but I tried using the logical
volume name as the device, ergo, /dev/wkVolGrp/vartmp, and both tune2fs
and debugfs worked more or less as advertised. In other words, 

    tune2fs -l /dev/wkVolGrp/vartmp 

and

    debugfs /dev/wkVolGrp/vartmp

I say more or less because the math was a little off, I'm not sure why.
Anyway, I was able to confirm that the bad sectors were not in any file,
so I went ahead and used dd to write zeros.

I got mixed results. First I tried writing blocks offset from
/dev/wkVolGrp/vartmp, but that didn't seem to do anything. I checked
that I had the right blocks, using 

    dd if=/dev/wkVolGrp/vartmp bs=4096 skip=<65something/> of=/dev/nul count=1

and then I wrote zeros to those blocks

    dd if=/dev/zero bs=4096 of=/dev/wkVolGrp/vartmp seek=<65something/> count=1

and synced and checked again and no change. So I tried writing sectors
(all eight of the ones that I had confirmed had errors on read) and that
seemed to move things. Specifically, reading offset from
/dev/wkVolGrp/vartmp, I could not find the bad sectors. But reading
offset from /dev/hda

    dd if=/dev/hda bs=512 skip=72411953 of=/dev/null count=1

and so forth still gave me input errors. Mucking about, I finally found
that I would still get input errors in /dev/wkVolGrp/vartmp, but now
they were about 80 sectors above where they had been before. That was
Friday night. I was guessing this was one of those ATAPI caching games,
but I was out of time and went home.

This morning, I ran smartctl -t long /dev/hda, and all the errors in the
/dev/wkVolGrp/vartmp volume had gone away. In their place I get eight
errors in a fairly low address, 2273463 as reported by smartctl -a.
Fortunately (?), that's in the swapping volume, which is a regular
physical partition, and I have enough RAM to avoid swap for the time
being. So I shut off swap and tried writing zeros to those sectors
again:

    dd if=/dev/zero bs=512 of=/dev/hda seek=2273463 count=1

But I got I/O errors on the writes. I tried syncing anyway, but 
smartctl -a reports no changes.

So, once again, 

> Anyone care to give me a clue where to go from here?

--
Joel Rees   <rees@xxxxxxxxxxx>
digitcom, inc.   株式会社デジコム
Kobe, Japan   +81-78-672-8800
** <http://www.ddcom.co.jp> **


> [...]

> On Wed, 15 Jun 2005 16:54:41 +0900
> Joel <rees@xxxxxxxxxxx> wrote
> 
> > > > I'm gettin mail to root on a new install of FC3. (Haven't had time to
> > > > update it yet.)
> > > > 
> > > > The messages come in pairs, especially after booting up in the morning.
> > > > The first is the offline uncorrectable, and the second is the current
> > > > pending, the number of sectors is five.
> > > > 
> > > > I've been digging around in the manpage for smartd and smartctl and I
> > > > don't really see much about what should be done. One comment in a
> > > > mailing list post suggests -U 0 and -C 0 in smartd.conf to silence the
> > > > complaints, but I have the idea that would just be looking to lose data.
> > > > 
> > > > I've done smartctl -a /dev/hda and had a look at what that tells me.
> > > > 
> > > > Are there tools available to help figure out which files the problem
> > > > sectors are in so I can check what should be there and maybe push a
> > > > write on the sectors to force remapping?
> > >
> > > http://smartmontools.sourceforge.net/BadBlockHowTo.txt
> > 
> > Thanks, Alexander.
> > 
> > It occurs to me that I should have mentioned that I have an lvm
> > partition on this disk. I read the manpages for lvm, and I didn't find
> > anything that explained how to convert the lvm sizes and extents to
> > physical sector numbers. 
> > 
> > But it's a fresh install. I did remove the last logical partition and
> > cut two partitions in the space there, but I haven't done any resizing.
> > So I thought I could get a rought idea just by converting everything to
> > bytes.
> > 
> > (debugfs doesn't seem to work on lvm?)
> > 
> > But if my math is right, it looks like the sector giving errors is off
> > the end of the disk.
> > 
> > Here's the relevant stuff --
> > 
> > [...]
> > The output of fdisk:
> > 
> > -----------------------------------------------------------------------
> > [root@rees-linux ~]# fdisk -l
> > 
> > Disk /dev/hda: 40.0 GB, 40020664320 bytes
> > 255 heads, 63 sectors/track, 4865 cylinders
> > Units = cylinders of 16065 * 512 = 8225280 bytes
> > 
> >     Device Boot      Start         End      Blocks   Id  System
> > /dev/hda1   *           1          33      265041   83  Linux
> > /dev/hda2              34         425     3148740   82  Linux swap
> > /dev/hda3             426        4865    35664300   8e  Linux LVM
> > -----------------------------------------------------------------------
> > 
> > Calculations in bc: 
> > 
> > -----------------------------------------------------------------------
> > [root@rees-linux ~]# bc
> > bc 1.06
> > Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
> > This is free software with ABSOLUTELY NO WARRANTY.
> > For details type `warranty'.
> > 
> > lba=72411953
> > print 16065 * 512 * 4865
> > 40015987200
> > print 255*63
> > 16065
> > print 255*63*4865
> > 78156225
> > 
> > start=255*63*425
> > print start+lba
> > 79239578
> > print (start+lba)*512
> > 40570663936
> > -----------------------------------------------------------------------
> > 
> > Should I be printing start-lba there?
> > 
> > And, once I do have an idea in which lvm volume it's located, is there a
> > flag for debugfs that I've missed, to allow me to work on an lvm volume?
> > Or are there lvm tools?