Re: O_DIRECT question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Linus Torvalds wrote:
> 
> On Sat, 13 Jan 2007, Michael Tokarev wrote:
>>> At that point, O_DIRECT would be a way of saying "we're going to do 
>>> uncached accesses to this pre-allocated file". Which is a half-way 
>>> sensible thing to do.
>> Half-way?
> 
> I suspect a lot of people actually have other reasons to avoid caches. 
> 
> For example, the reason to do O_DIRECT may well not be that you want to 
> avoid caching per se, but simply because you want to limit page cache 
> activity. In which case O_DIRECT "works", but it's really the wrong thing 
> to do. We could export other ways to do what people ACTUALLY want, that 
> doesn't have the downsides.
> 
> For example, the page cache is absolutely required if you want to mmap. 
> There's no way you can do O_DIRECT and mmap at the same time and expect 
> any kind of sane behaviour. It may not be what a DB wants to use, but it's 
> an example of where O_DIRECT really falls down.

Provided when the two are about the same part of a file.  If not, and if
the file is "divided" on a proper boundary (sector/page/whatever-aligned),
there's no issues, at least not if all the blocks of a file has been allocated
(no gaps, that is).

What I was referring to in my last email - and said it's a corner case - is:
mmap() start of a file, say, first megabyte of it, where some index/bitmap is
located, and use direct-io on the rest.  So the two aren't overlap.

Still problematic?

>>> But what O_DIRECT does right now is _not_ really sensible, and the 
>>> O_DIRECT propeller-heads seem to have some problem even admitting that 
>>> there _is_ a problem, because they don't care. 
>> Well.  In fact, there's NO problems to admit.
>>
>> Yes, yes, yes yes - when you think about it from a general point of
>> view, and think how non-O_DIRECT and O_DIRECT access fits together,
>> it's a complete mess, and you're 100% right it's a mess.
> 
> You can't admit that even O_DIRECT _without_ any non-O_DIRECT actually 
> fails in many ways right now.
> 
> I've already mentioned ftruncate and block allocation. You don't seem to 
> understand that those are ALSO a problem.

I do understand this.  And this is, too, solved right now in userspace.
For example, when oracle allocates a file for its data, or when it extends
the file, it writes something to every block of new space (using O_DIRECT
while at it, but that's a different story).  The thing is: while it is doing
that, no process tries to do anything with that (part of a) file (not counting
some external processes run by evil hackers ;)  So there's still no races
or fundamental brokeness *in usage*.

It uses ftruncate() to create or extend a file, *and* does O_DIRECT writes
to force block allocations.  That's probably not right, and that alone is
probably difficult to implement in kernel (I just don't know; what I know
for sure is that this way is very slow on ext3).  Maybe because there's no
way to tell kernel something like "set the file size to this and actually
*allocate* space for it" (if it doesn't write some structure to the file).

What I dislike very much is - half-solutions.  And current O_DIRECT indeed
looks like half-a-solution, because sometimes it works, and sometimes, in
*wrong* usage scenario, it doesn't, or racy, etc, and kernel *allows* such
a wrong scenario.  A software should either work correctly, or disallow
a usage where it can't guarantee correctness.  Currently, kernel allows
incorrect usage, and that, plus all the ugly things in code done in attempt
to fix that, suxx.

But the whole thing is not (fundamentally) broken.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux