Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)

On Tue, Mar 07, 2006 at 10:43:59AM +0100, Ingo Oeser (netdev@axxeo.de) wrote:
> Evgeniy Polyakov wrote:
> > On Mon, Mar 06, 2006 at 06:44:07PM +0100, Ingo Oeser (netdev@axxeo.de) wrote:
> > > Hmm, so I should resurrect my user page table walker abstraction?
> > > 
> > > There I would hand each page to a "recording" function, which
> > > can drop the page from the collection or coalesce it in the collector
> > > if your scatter gather implementation allows it.
> > 
> > It depends on where performance growth is stopped.
> > From the first glance it does not look like find_extend_vma(),
> > probably follow_page() fault and thus __handle_mm_fault().
> > I can not say actually, but if it is true and performance growth is
> > stopped due to increased number of faults and it's processing, 
> > your approach will hit this problem too, doesn't it?
> 
> My approach reduced the number of loops performed and number
> of memory needed at the expense of doing more work in the main
> loop of get_user_pages. 
> 
> This was mitigated for the common case of getting just one page by 
> providing a get_one_user_page() function.
> 
> The whole problem, why we need such multiple loops is that we have
> no common container object for "IO vector + additional data".
> 
> So we always do a loop working over the vector returned by 
> get_user_pages() all the time. The bigger that vector, 
> the bigger the impact.
> 
> Maybe sth. as simple as providing get_user_pages() with some offset_of 
> and container_of hackery will work these days without the disadvantages 
> my old get_user_pages() work had.
> 
> The idea is, that you'll provide a vector (like arguments to calloc) and two 
> offsets: One for the page to store within the offset and one for the vma 
> to store.
> 
> If the offset has a special value (e.g MAX_LONG) you don't store there at all.

You still need to find VMA in one loop, and run through it's(mm_structu) pages in
second loop.

> But if the performance problem really is get_user_pages() itself 
> (and not its callers), then my approach won't help at all.

It looks so.
My test pseudocode is following:
fget_light();
igrab();
kzalloc(number_of_pages * sizeof(void *));
get_user_pages(number_of_pages);
... undo ...

I've attached two graphs of performance with and without
get_user_pages(), it is get_user_pages.png and kmalloc.png.

Vertical axis is number of Mbytes per second thrown through above code,
horizontal one is number of pages in each run.
 
> Regards
> 
> Ingo Oeser

-- 
	Evgeniy Polyakov

Attachment: get_user_pages.png
Description: PNG image

Attachment: kmalloc.png
Description: PNG image

References:
- [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
  - From: Chris Leech <christopher.leech@intel.com>
- Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
  - From: Ingo Oeser <netdev@axxeo.de>
- Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
  - From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
- Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
  - From: Ingo Oeser <netdev@axxeo.de>

Prev by Date: Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
Next by Date: 2.6.16-rc5-mm3
Previous by thread: Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
Next by thread: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]