4k at a time only works well for an OS that does less per iteration than Linux does

To sum up for akpm what was said previously in this thread:

Linux needs to take the lesson learned from bios about how going up and
down through various software layers once per 4k is too expensive, and
generalize it to where in general, in all of the layers, we are
operating many pages at a time.  It would be nice for code simplicity if
the same struct, with perhaps some optional attachments to it, was used
the whole trip (rather than pagevecs in once place and bios in another). 

For kernel newbies on lkml let me summarize it as for every trip made
through these layers, a whole lot more than 4k of code gets traversed,
all of it seeming to have some legitimate purpose;-), so only handling
4k per trip is more expensive than you might guess.

Now, for the question that was raised:

Andreas Schäfer wrote:

>On 10:29 Wed 15 Mar     , Hans Reiser wrote:
>  
>
>>Tell the mosix guys we would be willing to cooperate with them regarding
>>their problem.
>>    
>>
>
>If it was that easy... The problem for openMosix is that most devices
>fetch data in 4k blocks via copy_from_user(). For migrated processes,
>openMosix intercepts these calls and forwards them to the node which
>currently hosts the process. This forwarding yields a high latency
>penalty.
>
>Obviously there are two ways to get rid of this problem: 
>
>* modify _every_ Linux device driver to use a
>  _a_lot_more_than_4k_at_a_time_ approach or
>  
>
I suspect that if the code benefitted more than mosix in its
implementation, and I think it would, then akpm might not be opposed to
the one above provided someone wrote it.  There is a rumor he already
understands this is a problem.  Maybe he can comment on the rumor.;-)

>* implement a second "read ahead" buffer which fetches large blocks via
>  the network in the background and answers calls to copy_from_user()
>  directly from the local buffer
>
>In my _very_ humble opinion the first approach would be much nicer,
>but after you guys had so many trouble with just your filesystem, I
>don't see that one coming, not at all.
>
>So I think the long term strategy for oM will the second, double
>buffering approach. At least I couldn't think of any other realistic,
>feasible way.
>
>BTW: how are you guys planning to solve this 4k issue? Will you revert
>to small blocks 
>
not sure what that means.  You mean surrender?  Not yet.;-)  First we
fix our code so that reiser4 really does what I am arguing it needs to
do, then we will argue it is the right approach.

>or will you "pretend" to perform 4k transfers and
>assemble those in the background to, again, process large chunks at
>once? 
>
Oh is that ugly....  no.  Dynamically sized pagevecs (I call them
pagezams) are better.

We will process things in large chunks all the way down to the io layer,
and then wait for Nate or others to fix the io layer someday.

>If yes, wouldn't this seriously increase CPU usage due to
>(most likely) unnecessary data duplication?
>
>Regards
>-Andreas
>
>
>  
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Prev by Date: Re: [PATCH 10 of 20] ipath - support for userspace apps using core driver
Next by Date: Re: [patch 5/8] hrtimer remove state field
Previous by thread: Fix sb_mixer use before validation
Next by thread: Fix sequencer missing negative bound check
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]