Re: size discrepancy after tarring a dir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 09/03/2010 04:40 PM, Cameron Simpson wrote:
> On 03Sep2010 14:21, JD<jd1008@xxxxxxxxx>  wrote:
> |   I have two mounted disks,  both ext3 mounted
> | as
> | /sdb1
> | /sdc1
> |
> | On /sdb1 I have a directory, let's call it dirx.
> |
> | 1. rm -rf /sdc1/dirx
> |
> | 2. cd /sdb1
> | 3. tar cf - dirx | tar -C /sdc1 -xpf -
> |
> | Neither dir (/sdb1 and /sdc1) are not accessed by any programs other
> | than the tar program (and of course /sdb1 is the shell's CWD).
> | The shell's history file is in my home dir.
> |
> | After tar:
> |
> | 4. du -sk dirx  /sdc1/dirx
> | 2904536    /sdc1/dirx
> | 2802124    dirx
> |
> | So, why this size inflation by 104MiB ?
> |
> | I repeated the process twice. Same difference.
> |
> | Other dirs tarred in this way from sdb1 to sdc1 do not show this
> | discrepancy.
>
> There are two possible sources of discrepancies that I can think of:
>    - different filesystem types
>    - different directory packing
>    - file fragmentation
>
> I presume we can discount the first one.
>
> Directory packing normally is _better_ in a new directory; older directories
> can accumulate holes from file deletions. So the second one seems unlikely too.
> The way to check is to walk the trees with find and tally sizes with
> awk:
>
>    find /sdb1/dirx -type d -ls | awk '{sum += $7} END { print sum }'
>    find /sdc1/dirx -type d -ls | awk '{sum += $7} END { print sum }'
>
> The size difference seems to large for this anyway.
> That leaves file fragmentation. Does sdc1 have a lot of other data?
> Maybe complete MP3s won't fit into the gaps, and must be broken up more.
> Again, like new directories, there is normally less fragmentation in
> copied files, not more. And MP3s tend to be written in one go anyway, so
> the source files are probablem not fragmented either.
>
> None of these choices seem likely to me.
>
> There is a final option which should not apply because these are different
> fileystems and also because your files are definitely copies: hard link
> counting. du notices hard links and correctly does not count the second
> name twice. If you do this:
>
>    du -sk dir1 dir2
>
> and dir1 and dir2 have some files hard linked between them then du will
> not count the hardlinked files when it encounters them, and you would
> then see "dir2" have a lower count than you might expect otherwise.
>
> The way to check this one is to run two dus:
>
>    du -sk dir1
>    du -sk dir2
>
> You can also scour your tree for hard links:
>
>    find /sdc1/dirx -type f -nlink +1 -ls
>
> though your tar copy should preserve the hard linking in your copy, and
> thus not change the totals.
>
> In short, several things are listed above that can produce different "on
> disc" sizes for copied data, and I don't really think any of them
> explain your results. But do some of the checks I suggest - if nothing
> else they may reveal more clues.
>
> | Dirx contains mp3's.
>
> "MP3s", please. There are no apostrophes in plurals!
>
> Cheers,

I believe it must be directory packing. sdc1 is 83% full and sdb1 is 
81%full.
There is also a high fragmentation of free space on both sdb1 and sdc1.
Thanx for the info!!!!


-- 
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux