Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 07, 2006 at 10:43:59AM +0100, Ingo Oeser ([email protected]) wrote:
> Evgeniy Polyakov wrote:
> > On Mon, Mar 06, 2006 at 06:44:07PM +0100, Ingo Oeser ([email protected]) wrote:
> > > Hmm, so I should resurrect my user page table walker abstraction?
> > > 
> > > There I would hand each page to a "recording" function, which
> > > can drop the page from the collection or coalesce it in the collector
> > > if your scatter gather implementation allows it.
> > 
> > It depends on where performance growth is stopped.
> > From the first glance it does not look like find_extend_vma(),
> > probably follow_page() fault and thus __handle_mm_fault().
> > I can not say actually, but if it is true and performance growth is
> > stopped due to increased number of faults and it's processing, 
> > your approach will hit this problem too, doesn't it?
> 
> My approach reduced the number of loops performed and number
> of memory needed at the expense of doing more work in the main
> loop of get_user_pages. 
> 
> This was mitigated for the common case of getting just one page by 
> providing a get_one_user_page() function.
> 
> The whole problem, why we need such multiple loops is that we have
> no common container object for "IO vector + additional data".
> 
> So we always do a loop working over the vector returned by 
> get_user_pages() all the time. The bigger that vector, 
> the bigger the impact.
> 
> Maybe sth. as simple as providing get_user_pages() with some offset_of 
> and container_of hackery will work these days without the disadvantages 
> my old get_user_pages() work had.
> 
> The idea is, that you'll provide a vector (like arguments to calloc) and two 
> offsets: One for the page to store within the offset and one for the vma 
> to store.
> 
> If the offset has a special value (e.g MAX_LONG) you don't store there at all.

You still need to find VMA in one loop, and run through it's(mm_structu) pages in
second loop.

> But if the performance problem really is get_user_pages() itself 
> (and not its callers), then my approach won't help at all.

It looks so.
My test pseudocode is following:
fget_light();
igrab();
kzalloc(number_of_pages * sizeof(void *));
get_user_pages(number_of_pages);
... undo ...

I've attached two graphs of performance with and without
get_user_pages(), it is get_user_pages.png and kmalloc.png.

Vertical axis is number of Mbytes per second thrown through above code,
horizontal one is number of pages in each run.
 
> Regards
> 
> Ingo Oeser

-- 
	Evgeniy Polyakov

Attachment: get_user_pages.png
Description: PNG image

Attachment: kmalloc.png
Description: PNG image


[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux