Fedora Users — Re: Forced FSCK on Bad Reboot

I wrote:
> Ext3 is a "journalling" filesystem (as are Reiser, xfs, jfs, and NTFS).
> Linux keeps a journal on disk of what it's doing, and makes sure that
> the journal reaches disk before it makes any changes to the filesystem
> structure on disk.

Mike McCarty wrote:
> Hmm. So if a power failure occurs during the update of the journal, the disc
> is corrupted anyway.

Erm... no.

Imagine a journal as being a bit like a paper tape. You can read
anywhere, but you can only meaningfully add to the end of the journal.
So if a power failure occurs during update, and you get corruption
there, then you know that's the end of the journal, and you can treat it
as though the update never happens.

Look at it this way (in a fixed-width font):

time --->
        A                   B                   C
update journal,     |  update disks   | update journal,
open transaction    |                 | close transaction.

If the power dies at all during A, and the journal is corrupt, then
Linux knows the update can never have happened, and the filesystem
structure itself is good.

By the time the system gets to the beginning of B, everything from A has
reached disk. Any power failures, and you can go back to the journal.

If you have a power failure during C, it doesn't matter whether C's
write has partially reached disk or not reached there at all. As long as
you can detect the corruption, you know something is fishy, and you roll
back the transaction. The data from A (and from B, although you don't
know that) is safe and properly written, and can be relied on.

Once C's write has safely reached disk, everything's fine.

Besides, as I understand it, due to the magic of capacitors, the write
of an individual disc sector either will happen or it won't. Disks that
write garbage on power down are technically known as "broken".

> It's like a COBOL programmer back in the bad old
> days, who claimed that, since he always used databases which had
> journals and a separate commit call, his databases could never get
> corrupted. I argued and argued with this guy. Sadly, one day he found out
> I was correct, and had no recovery plan.

Like I say: all hardware sucks, all software sucks. COBOL guy ought to
have had at least two backup plans.

> It seems to me that you left out a lot of details.
Usually the case, when you're simplifying stuff. For example, if you
know that the changed data is safely in the journal, you can reasonably
rely on it being on disk.

So if a lot of changes are to be made to a directory (for example),
Linux can "checkpoint", write the changes to the journal in one (not
very) long write, and then write the final version to disk. That can be
*much* faster than constantly writing changed versions: lots of
independent writes mean lots of seeks, which is what really slows down
spinning media.

And if the power goes, Linux just replays the journal.

Again, I'm simplifying. And I don't know half the details.

> Reading between the lines,
> I'll guess that what you are saying is ext3 uses a lot of disc cache with
> write-back rather than write-through policy,

That's normal on practically all OSes these days: it seriously helps
performance.

> and journals what it has done
> to the memory copy. Thus unwritten system buffers at power down don't
> corrupt the disc.
> 
> Frankly, I'd rather use write-through.

Possibly. It can seriously slow down disk operations. Note that ext2, at
least by default, does *not* use write-through.

> In any case, I don't see any argument for not using an extended fsck on
> a reboot after improper shutdown, which was my original question.

It sounds to me as though you've got lots of experience with computers,
and have an accordingly low opinion of their reliability. As a result,
you want data-critical stuff to be relatively simple and obviously safe,
and all possible checks to take place.

It's just ... modern computers *aren't* simple. Eventually, you have to
treat some of this stuff as black boxes, and rely on its own internal
error checking.

Yes, filesystem corruption still happens. But it's not noticably more
likely to happen as a result of improper shutdowns. Periodically running
a full fsck is sensible anyway: read man tune2fs and look at the -c and
-i options.

A quick look at /etc/rc.d/rc.sysinit suggests that doing
touch /forcefsck
as root will force a full fsck at the next reboot.

You could always edit that script and /etc/init.d/halt so that
/forcefsck is created at boot time and removed at normal shutdown.

Note that edits to rc.sysinit or halt won't survive an RPM update of
initscripts.

Hope this helps,

James.
-- 
E-mail address: james | "!" sez I.  And "?".  After a few speechless seconds
@westexe.demon.co.uk  | I come out with "%^&*".  Unless I come up with
                      | something plausible soon I'm going to run out of
                      | special characters.  -- Ben at lspace.org