Testing framework — Linux Kernel

Hi,

For some time I had been working on this file system test framework.
Now I have a implementation for the same and below is the explanation.
Any comments are welcome.

Introduction:
The testing tools and benchmarks available around do not take into
account the repair and recovery aspects of file systems. The test
framework described here focuses on repair and recovery capabilities
of file systems. Since most file systems use 'fsck' to recover from
file system inconsistencies, the test framework characterizes file
systems based on outcomes of running 'fsck'.

Overview:
The model can be described in brief as - prepare a file system, record
the state of the file system, corrupt it, use repair and recovery
tools and finally compare and report the status of the recovered file
system against its initial state.

Prepare Phase:
This is the first phase in the model. Here we prepare a file system to
carry out subsequent phases. A new file system image is created with
the specified name. 'mkfs' program is run on this image and then the
file system is aged after populating it sufficiently. This state of
the file system is considered as an ideal state.

Corruption Phase:
The file system prepared in the prepare phase is corrupted to simulate
a system crash or in general an inconsistency in the file system.
Obviously we are more interested in corrupting the metadata
information. A random corruption would provide us with the results
like that of fs_mutator or fs_fuzzer. However, for different test runs
the corruption would vary and hence it wouldn't be fair and tedious to
have a comparison between file systems. So, we would like have a
mechanism where the corruption could be replayable thus ensuring
almost same amount of corruption be reproduced across test runs. The
techniques for corruption are:

Higher level perspective/approach:
In this approach the file system is viewed as a tree of nodes, where
nodes are either files or directories. The metadata information
corresponding to some randomly chosen nodes of the tree are corrupted.
Nodes which are corrupted are marked or recorded to be able to replay
later. This file system is called source file system while the file
system on which we need to replay the corruption is called target file
system. The assumption is that the target file system contains a set
of files and directories which is a superset of that in the source
file system. Hence to replay the corruption we need point out which
nodes in the source file system were corrupted in the source file
system and corrupt the corresponding nodes in the target file system.

A major disadvantage with this approach is that on-disk structures
(like superblocks, block group descriptors, etc.) are not considered
for corruption.

Lower level perspective/approach:
The file system is looked upon as a set of blocks (more precisely
metadata blocks). We randomly choose from this set of blocks to
corrupt. Hence we would be able to overcome the deficiency of the
previous approach. However this approach makes it difficult to have a
replayable corruption. Further thought about this approach has to be
given.

We could have a blend of both the approaches in the program to
compromise between corruption and replayability.

Repair Phase:
The corrupted file system is repaired and recovered with 'fsck' or any
other tools; this phase considers the repair and recovery action on
the file system as a black box. The time taken to repair by the tool
is measured.

Comparison Phase:
The current state of the file system is compared with the ideal state
of the file system. The metadata information of the file system is
checked with that of the ideal file system and the outcome is noted to
summarize on this test run. If repair tool used is 100% effective then
the current state of the file system should be exactly the same as
that of the ideal file system. Simply checking for equality wouldn't
be right because it doesn't take care of lost and found files. Hence
we need to check node-by-node for each node in the ideal state of the
file system.

State Record:
The comparison phase requires that the ideal state of the file system
be known. Replicating the whole file system would eat up a lot of disk
space. Storing the state of the file system in memory would be costly
in case of huge file systems. So, we need to store the state of the
file system on the disk such that it wouldn't take up a lot of disk
space. We record the metadata information and store it onto a file.
One approach is replicating the metadata blocks of the source file
system and storing the replica blocks under a single file called state
file. Additional metadata such as checksum of the data blocks can be
stored in the same state file. However this may store some unnecessary
metadata information in the state file and hence swelling it up for
huge source file systems. So, instead of storing the metadata blocks
themselves we would summarize the information in them before storing
in the state file.

Summary Phase:
This is the final phase in the model. A report file is prepared which
summarizes the result of this test run. The summary contains:

Average time taken for recovery
Number of files lost at the end of each iteration
Number of files with metadata corruption at the end of each iteration
Number of files with data corruption at the end of each iteration
Number of files lost and found at the end of each iteration

Putting it all together:
The Corruption, Repair and Comparison phases could be repeated a
number of times (each repetition is called an iteration) before the
summary of that test run is prepared.

TODO:
Account for files in the lost+found directory during the comparison phase.
Support for other file systems (only ext2 is supported currently)
State of the either file system is stored, which may be huge, time
consuming and not necessary. So, we could have better ways of storing
the state.

Comments are welcome!!

Thanks,
Karuna

Attachment: tf.tar.bz2
Description: BZip2 compressed data

Follow-Ups:
- Re: Testing framework
  - From: Pavel Machek <[email protected]>
- Re: Testing framework
  - From: Avishay Traeger <[email protected]>
- Re: Testing framework
  - From: Kalpak Shah <[email protected]>

Prev by Date: Re: [PATCH] ia64 sn xpc: Convert to use kthread API.
Next by Date: Re: [PATCH] reiserfs: fix xattr root locking/refcount bug
Previous by thread: ChunkFS - measuring cross-chunk references
Next by thread: Re: Testing framework
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]