[ Thank you very much for your response ] "Bill Rugolsky Jr." wrote: > There are fundamental differences between what a NetApp filer is > doing, and what LVM2 snapshots provide. Yeah, it's hard to beat WAFL's well-integrated design. > In particular, when using LVM2 snapshots, kcopyd has to constantly > move blocks from your filesystem LV to the snapshot LV. Device Mapper > is much more sensible and efficient at this than LVM1, So I don't even want to look at LVM1, good. > but it is still non-trivial overhead, and ends up generating a lot > of mixed read/write traffic. That's what I figured. > We are currently using NFS/Ext3/LVM2/MD on a 2.6.8-rc1 kernel as our > backup NFS server, That's going to be my usage, as a backup NFS server to a _real_ NetApp filer. It's largely more for Windows users than UNIX clients, but I'll still need some production NFS support. > and initial testing with snapshots under load uncovered some > performance problems that I need to track down. What kind of memory-I/O do you have in your system? I'm hoping to do this on 1GB of RAM, but it's not my primary NFS server. > [Snapshots and mirroring were only recently added to the Device Mapper > code in the Linus kernel tree.] Yep, I saw that. I also noticed the Red Hat 2.6.7 development kernels are now patching them in (or are 2.6.8RC-based?). > Either grab the most recent kernel from kernel.org, or an FC3 development > kernel, and test extensively. I can deal with performance issues. If they get bad enough, I'll just not use snapshots and enable them later when they get the quirks worked out. > The NetApp WAFL filesystem encapsulates all meta-data in a tree structure, > and uses persistent copy-on-write multi-rooted trees. When writing, it > places data wherever it is convenient (i.e., in the free space), and then > adjusts block pointers up toward the root of the tree. Every few seconds > it checkpoints its state (i.e., takes a snapshot). Yep. It's not using disseparate volume management from filesystem, WALF is an "all-in-one" for great efficiency. > [The NetApp also uses NVRAM to hold state that hasn't been flushed to > disk.] I've done similar with 1GB PCI NVRAM boards, using it as an off-device full-data Ext3 journal. Makes NFS v3 sync performance far better. > When one wants to save a snapshot, the filesystem tags it and maintains > its allocation data, instead of releasing stale blocks back into the free > pool. Right. > Based on what I've read of Reiser4, the design should allow a similar > level of functionality to be incorporated at some point. Unfortunately, > it is not done yet. I've seen ReiserFS v4 promise a lot, but compatibility always seems to be an issue. I'll stick with XFS. > To summarize: LVM2 will do what you want (modulo some tuning and > perhaps bug fixes), but it is not an NetApp. Yeah, it's not WAFL. But if it works, that's what I want. I'm only concerned about data integrity, not performance, since it is my backup NFS server. > IIRC, XFS does not do data journaling. So while it may be much > faster than Ext3, you need to consider data integrity. I use Ext3 in meta-data journaling mode (ordered writes), so I don't see that much difference. I was just mentioning XFS in case it is considered a better option, especially if SGI has a GPL for LVM2 on Linux. But I assume not. > I haven't been following EVMS development, but you might want > to look into the current state of affairs to find out if there > is any functionality there that you need (e.g., badblock handling). I _always_ use hardware RAID, so badblock handling is handled by the intelligent controller. In this case, it's going to be a 3Ware Escalade 9000 series. > LVM2 installs work fine. Good. That's my #1 issue. I can do snapshots later if needbe, or limit their usage to select filesystems. > Some things you might want to do: > 1. Script some infrastructure to monitor snapshot space usage. I do that anyway for disk usage, so not much there. > 2. Cron a job to snapshot and fsck the filesystem, so any > filesystem problems are revealed early. Why do I need to fsck the filesystem? > 3. If using Ext3 with data journaling, specify a large journal when > creating the filesystem (e.g., mke2fs -j -J size=400 ...). So you recommend Ext3 with full data journaling? I used to do that back in the 2.2 days with VA Linux kernel, and I might if I use a PCI NVRAM board. But I've found Ext3 with ordered writes in 2.4 to be 100% reliable. Is it not for LVM2/snapshots? I would _not_ use Ext3 with writeback though, not worth the potential data loss for small performance gain. > 4. Tune the filesystem and VM variables: flush time, readahead, etc. Is there a good reference based on CPU, I/O, memory, etc...? > 5. Test whether an external journal in the form of an NVRAM card > or additional disks would improve performance. (You can try with > a ramdisk for test purposes). I'd love to throw such a board in the system, but that's only going to add costs. I'm hoping using Ext3 with ordered writes (meta-data) and NFS v3 async operation will work fine. Do you see any issues? -- Linux Enthusiasts call me anti-Linux. Windows Enthusisats call me anti-Microsoft. They both must be correct because I have over a decade of experience with both in mission critical environments, resulting in a bigotry dedicated to mitigating risk and focusing on technologies ... not products or vendors -------------------------------------------------- Bryan J. Smith, E.I. b.j.smith@xxxxxxxx