On Wed, Jun 25, 2008 at 2:19 PM, Patrick O'Callaghan <pocallaghan@xxxxxxxxx> wrote: > On Wed, 2008-06-25 at 13:31 +0100, Dan Track wrote: >> Thanks for the heads up on this. If the data blocks don't have >> anything written into them, then what data is written into them when >> using dd? if I restore the dd image will the blocks then be in the >> same state i.e unwritten to? >> >> Also following on from this if I create a file using dd let's say 2GB, >> how does the filesystem know that all these blocks belong to the file >> myfile.img, and where is the information stored to say that a block >> has data written into it or not? > > It's important to understand that this has nothing to do with 'dd', it's > simply how the Unix filesystem works, and since Linux is "culturally > derived" from Unix, it does the same thing. You would see the same > effect just by using 'cp' or even 'cat'. > > The basic points are these (I'm skating over a lot for clarity): > > 1) The system maintains a list of every physical disk block assigned to > the file (thus one of the things the 'fsck' command checks is that every > block in the filesystem is either assigned to a file or is on the free > list). > > 2) When a process writes to a file it need not do so sequentially > because the lseek(2) operation allows it to move it's "current position" > in the file. Furthermore, it's permissible to move the pointer beyond > the current end of the file. If a process does this by a large enough > amount and then writes data, the intervening space may have no disk > blocks assigned to it (depending on the distance moved and block > alignment). This is called a 'hole'. Files with holes in them are called > 'sparse'. > > 3) The system keeps a separate count of the logical size of the file. > Because of the holes the logical size may be different from the physical > size. "ls -l" shows the logical size. "du" shows the real physical size > and may be different. > > 4) When a process tries to read from a hole, the system simply returns > nulls for the corresponding bytes. However if a process writes nulls > into a file, the system does *not* make any effort to detect them as a > special case, so they are simply written as any other data and the > system will allocate blocks to them. This happens when 'dd' (or 'cp' or > 'cat') copies a file, so the resulting file can be larger than the > original. > > Note that 'rsync --sparse' will preserve holes when it can. > > Note also that if you're not careful you can backup a file or even a > filesystem that you can't restore because it's too big, especially if > copying it to some medium (e.g. a tape drive or non-UNIX disk system) > that can't handle sparse files. > > Hope this helps. > > poc Hi Patrick, Really appreciate the detailed explanation. It's a real eye opener. Can you point me to any docs that I could read around this subject? Thanks Dan -- fedora-list mailing list fedora-list@xxxxxxxxxx To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list