Strange LVM2/DM data corruption with 2.6.11.12

Hi !

We are developing (GPLed) disk cloning software similar to partimage: it's anintelligent 'dd' which backups only used sectors.

Recently I added LVM1/2 support to it, and sometimes we saw LVM restorationsfailing randomly (Disk images are not corrupted, but the result of therestoration can be lead to a corrupted filesystem). If a restoration fails, justtry another one and it will work...

How the restoration program works:
- I restore the LVM2 administrative data (384 sectors, most of the time),
- I 'vgscan', 'vgchange',
- open for writing the '/dev/dm-xxxx',
- read a compressed file over NFS,

- and put the sectors in place, so it's a succession of '_llseek()' and'write()' to the DM device.

But, *sometimes*, for example, the current seek position is at 9GB, and somedata is written to sector 0 ! It happens randomly.

Here is a typical strace of a restoration:

write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
_llseek(5, 20963328, [37604032512], SEEK_CUR) = 0
_llseek(5, 0, [37604032512], SEEK_CUR)  = 0
_llseek(5, 2097152, [37606129664], SEEK_CUR) = 0
write(5, "\1\335E\0\f\0\1\2.\0\0\0\2\0\0\0\364\17\2\2..\0\0\0\0\0"..., 512) = 51
2
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
write(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
_llseek(5, 88076288, [37694210048], SEEK_CUR) = 0
_llseek(5, 0, [37694210048], SEEK_CUR)  = 0
_llseek(5, 20971520, [37715181568], SEEK_CUR) = 0
write(5, "\377\377\377\377\377\377\377\377\377\377\377\377\377\377"..., 512) = 5
12
....
....

As you can see, there are no seeks to sector 0, but something randomly writesome data to sector 0 !

I could reproduce these random problems on different kind of PCs.

But, the strace above comes from an improved version, which aggregates'_llseek's. A previous version, which did *many* 512 bytes seeks had much moreproblems. Aggregating seeks made the corruption to appears very rarely... And Imore likely to happen, for a 40GB restoration than for a 10GB one.

So less system calls to the DM/LVM2 layer seems to give less corruption probability.


Any ideas ? Newer kernel releases could have fixed such a problem ?


--
Ludovic DROLEZ                              Linbox / Free&ALter Soft
http://lrs.linbox.org - Linbox Rescue Server GPL edition
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: Strange LVM2/DM data corruption with 2.6.11.12
  - From: Alexander Nyberg <alexn@telia.com>

Prev by Date: Re: [PATCH 1/3] dynticks - implement no idle hz for x86
Next by Date: Re: [SCSI] qla1280: endianess annotations
Previous by thread: SKGE Kconfig help text typo fix
Next by thread: Re: Strange LVM2/DM data corruption with 2.6.11.12
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind]