Re: OCFS2 Filesystem inconsistency across nodes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Friday 10 February 2006 06:46, Mark Fasheh wrote:
> Great. We'll keep things simple at first. If I could get a copy of the
> /etc/ocfs2/cluster.conf files from each node that'd be great. A full log of
> the OCFS2 messages you see on each node, starting from mount to unmount
> would also help. That includes any dlm_* messages - in particular the ones
> printed when a node mounts and unmounts. If you're using any mount options
> it'd be helpful to know those too.

 Hi again,

This is my /etc/ocfs2/cluster.conf on every node:

cluster:
        node_count = 3
        name = oratest

node:
        ip_port = 7777
        ip_address = 192.168.100.30
        number = 0
        name = iscsi-teste
        cluster = oratest

node:
        ip_port = 7777
        ip_address = 192.168.100.31
        number = 1
        name = orateste1
        cluster = oratest

node:
        ip_port = 7777
        ip_address = 192.168.100.32
        number = 2
        name = orateste2
        cluster = oratest

------------------------------------------

 So today I rebooted the machines and started over and formatted the volume 
again from node 0 with 

iscsi-teste:~# mkfs.ocfs2 -b 4K -C 64K -N 4 -L OCFSTest1 /dev/sda
mkfs.ocfs2 1.1.5
Filesystem label=OCFSTest1
Block size=4096 (bits=12)
Cluster size=65536 (bits=16)
Volume size=2489995755520 (37994320 clusters) (607909120 blocks)
1178 cluster groups (tail covers 29008 clusters, rest cover 32256 clusters)
Journal size=33554432
Initial number of node slots: 4
WARNING: bitmap is very large, consider using a larger cluster size and/or
a smaller volume
Creating bitmaps: done
Initializing superblock: done
Writing system files: done
Writing superblock: done
Writing lost+found: done
mkfs.ocfs2 successful


 (By the way, I didn't use a bigger cluster size because we plan to use this 
fs to store mainly lots of small files)

 I then mounted the fs (with default options:rw,_netdev,heartbeat=local) on 
node 0 and node 2 and started the usual tests by creating, writing and 
copying files. 
 Later I also mounted the volume on node 1. By this time, node 0 and node 2 
showed completely different files on the same directory. After mounting, I 
could see on node 1 the files I had created from node 0 but not the ones 
created from node 2. After some more file tests I unmounted on all nodes and 
then remounted on all of them.

 I can get an interesting result by creating several large files from *node 0* 
and md5sum them; I list the same directory on node 2, the files are not 
visible there; so I also create several large files from *node 2*; then I 
md5sum the files I created from node 1 again and the md5sums are changed.
 So I think that node 2 was allocating and writing to space that was already 
allocated by node 0, overwriting files that were already there.
 I can reproduce this easily with any pair of nodes. Kernel messages follow 
for each node, though for most of the time there are no special error 
messages. In the end there are some on node 0, when I was copying files and 
extracting tarballs on all three nodes concurrently. I put the logs online to 
avoid upsetting lkml.

dmesg from iscsi-teste (node 0):
http://coyote.ist.utl.pt/ocfs2/dmesg_iscsi-teste.txt

dmesg from orateste1 (node 1):
http://coyote.ist.utl.pt/ocfs2/dmesg_orateste1.txt

dmesg from orateste2 (node 2):
http://coyote.ist.utl.pt/ocfs2/dmesg_orateste2.txt

 These are the logs from yesterday's tests, also with some interesting error 
messages on node 2 (the only difference yesterday was that I had the SCTP 
module loaded, but that doesn't seem to change the results):

http://coyote.ist.utl.pt/ocfs2/kern.log-iscsi-teste.txt
http://coyote.ist.utl.pt/ocfs2/kern.log-orateste1.txt
http://coyote.ist.utl.pt/ocfs2/kern.log-orateste2.txt

 And the kernel config files:

http://coyote.ist.utl.pt/ocfs2/config-2.6.16-rc2-git3-XEON.txt
http://coyote.ist.utl.pt/ocfs2/config-2.6.16-rc2-git3-AMD.txt

 If you need the files emailed to you just tell me. 
 Let me know if you need more tests/info.

 Thanks for the help.

Best regards

Claudio

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux