On Wed, 2008-06-25 at 13:31 +0100, Dan Track wrote: > Thanks for the heads up on this. If the data blocks don't have > anything written into them, then what data is written into them when > using dd? if I restore the dd image will the blocks then be in the > same state i.e unwritten to? > > Also following on from this if I create a file using dd let's say 2GB, > how does the filesystem know that all these blocks belong to the file > myfile.img, and where is the information stored to say that a block > has data written into it or not? It's important to understand that this has nothing to do with 'dd', it's simply how the Unix filesystem works, and since Linux is "culturally derived" from Unix, it does the same thing. You would see the same effect just by using 'cp' or even 'cat'. The basic points are these (I'm skating over a lot for clarity): 1) The system maintains a list of every physical disk block assigned to the file (thus one of the things the 'fsck' command checks is that every block in the filesystem is either assigned to a file or is on the free list). 2) When a process writes to a file it need not do so sequentially because the lseek(2) operation allows it to move it's "current position" in the file. Furthermore, it's permissible to move the pointer beyond the current end of the file. If a process does this by a large enough amount and then writes data, the intervening space may have no disk blocks assigned to it (depending on the distance moved and block alignment). This is called a 'hole'. Files with holes in them are called 'sparse'. 3) The system keeps a separate count of the logical size of the file. Because of the holes the logical size may be different from the physical size. "ls -l" shows the logical size. "du" shows the real physical size and may be different. 4) When a process tries to read from a hole, the system simply returns nulls for the corresponding bytes. However if a process writes nulls into a file, the system does *not* make any effort to detect them as a special case, so they are simply written as any other data and the system will allocate blocks to them. This happens when 'dd' (or 'cp' or 'cat') copies a file, so the resulting file can be larger than the original. Note that 'rsync --sparse' will preserve holes when it can. Note also that if you're not careful you can backup a file or even a filesystem that you can't restore because it's too big, especially if copying it to some medium (e.g. a tape drive or non-UNIX disk system) that can't handle sparse files. Hope this helps. poc -- fedora-list mailing list fedora-list@xxxxxxxxxx To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list