Fedora Users — Re: Understanding how dd works

On Wed, 2008-06-25 at 13:31 +0100, Dan Track wrote:
> Thanks for the heads up on this. If the data blocks don't have
> anything written into them, then what data is written into them when
> using dd? if I restore the dd image will the blocks then be in the
> same state i.e unwritten to?
> 
> Also following on from this if I create a file using dd let's say 2GB,
> how does the filesystem know that all these blocks belong to the file
> myfile.img, and where is the information stored to say that a block
> has data written into it or not?

It's important to understand that this has nothing to do with 'dd', it's
simply how the Unix filesystem works, and since Linux is "culturally
derived" from Unix, it does the same thing. You would see the same
effect just by using 'cp' or even 'cat'.

The basic points are these (I'm skating over a lot for clarity):

1) The system maintains a list of every physical disk block assigned to
the file (thus one of the things the 'fsck' command checks is that every
block in the filesystem is either assigned to a file or is on the free
list).

2) When a process writes to a file it need not do so sequentially
because the lseek(2) operation allows it to move it's "current position"
in the file. Furthermore, it's permissible to move the pointer beyond
the current end of the file. If a process does this by a large enough
amount and then writes data, the intervening space may have no disk
blocks assigned to it (depending on the distance moved and block
alignment). This is called a 'hole'. Files with holes in them are called
'sparse'.

3) The system keeps a separate count of the logical size of the file.
Because of the holes the logical size may be different from the physical
size. "ls -l" shows the logical size. "du" shows the real physical size
and may be different.

4) When a process tries to read from a hole, the system simply returns
nulls for the corresponding bytes. However if a process writes nulls
into a file, the system does *not* make any effort to detect them as a
special case, so they are simply written as any other data and the
system will allocate blocks to them. This happens when 'dd' (or 'cp' or
'cat') copies a file, so the resulting file can be larger than the
original.

Note that 'rsync --sparse' will preserve holes when it can.

Note also that if you're not careful you can backup a file or even a
filesystem that you can't restore because it's too big, especially if
copying it to some medium (e.g. a tape drive or non-UNIX disk system)
that can't handle sparse files.

Hope this helps.

poc

-- 
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list