>>>>> "Aaron" == Aaron Wiebe <[email protected]> writes:
More details on which kernel you're using and which distro would be
helpful. Also, more details on your App and reasons why you're
writing bunches of small files would help as well.
Aaron> Greetings all. I'm not on this list, so I apologize if this subject
Aaron> has been covered before. (Also, please cc me in the response.)
Aaron> I've spent the last several months trying to work around the lack of a
Aaron> decent disk AIO interface. I'm starting to wonder if one exists
Aaron> anywhere. The short version:
Aaron> I have written a daemon that needs to open several thousand
Aaron> files a minute and write a small amount of data to each file.
How large are these files? Are they all in a single directory? How
many files are in the directory?
Ugh. Why don't you just write to a DB instead? It sounds like you're
writing small records, with one record to a file. It can work, but
when you're doing thousands per-minute, the open/close overhead is
starting to dominate. Can you just amortize that overhead across a
bunch of writes instead by writing to a single file which is more
structured for your needs?
Aaron> After extensive research, I ended up going with the POSIX AIO
Aaron> kludgy pthreads wrapper in glibc to handle my writes due to the
Aaron> time constraints of writing my own pthreads handler into the
Aaron> application.
Aaron> The problem with this equation is that opens, closes and
Aaron> non-readwrite operations (fchmod, fcntl, etc) have no interface
Aaron> in posix aio. Now I was under the assumption that given open
Aaron> and close operations are comparatively less common than the
Aaron> write operations, this wouldn't be a huge problem. My tests
Aaron> seemed to reflect that.
Aaron> I went to production with this yesterday to discover that under
Aaron> production load, our filesystems (nfs on netapps) were
Aaron> substantially slower than I was expecting. open() calls are
Aaron> taking upwards of 2 seconds on occation, and usually ~20ms.
Netapps usually scream for NFS writes and such, so it sounds to me
that you've blown out the NVRAM cache on the box. Can you elaborate
more on your hardware & Network & Netapp setup?
Of course, you could also be using sucky NFS configuration, so we need
to see your mount options as well. You are using TCP and NFSv3,
right? And a large wsize/rsize values too?
Have you also checked your NetApp to make sure you have the following
options turned OFF:
nfs.per_client_stats.enable
nfs.mountd_trace
Seeing your exports file and output of 'options nfs' would help.
Aaron> Now, Netapp speed aside, O_NONBLOCK and O_DIRECT seem to make
Aaron> zero difference to my open times. Example:
Aaron> open("/somefile", O_WRONLY|O_NONBLOCK|O_CREAT, 0644) = 1621 <0.415147>
Aaron> Now, I'm a userspace guy so I can be pretty dense, but
Aaron> shouldn't a call with a nonblocking flag return EAGAIN if its
Aaron> going to take anywhere near 415ms? Is there a way I can force
Aaron> opens to EAGAIN if they take more than 10ms?
The problem is that O_NONBLOCK on files open doesn't make sense. You
either open it, or you don't. How long it takes to comlete isn't part
of the spec.
But in this case, I think you're doing something hokey with your data
design. You should be opening just a handful of files and then
streaming your writes to those files. You'll get much more
performance.
Also, have you tried writing to a local disk instead of via NFS to see
how local disk speed is?
Aaron> (ps. having come from the socket side of the fence, its
Aaron> incredibly frustrating to be unable to poll() or epoll regular
Aaron> file FDs -- Especially knowing that the kernel is translating
Aaron> them into a TCP socket to do NFS anyway. Please add regular
Aaron> files to epoll and give me a way to do the opens in the same
Aaron> fasion as connects!)
epoll isn't going to help you much here, it's the open which is
causing the delay, not the writing to the file itself.
Maybe you need to be caching more of your writes into memory on the
client side, and then streaming them to the NetApp later on when you
know you can write a bunch of data at once.
But honestly, I think you've done a bad job architecting your
application's backend data store and you really need to re-think it
through.
Heck, I'm not even much of a programmer, I'm a SysAdmin who runs
Netapps and talks the users into more sane ways of getting better
performance out of their applications. *grin*.
John
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]