On Tue, Mar 07, 2006 at 10:43:59AM +0100, Ingo Oeser ([email protected]) wrote: > Evgeniy Polyakov wrote: > > On Mon, Mar 06, 2006 at 06:44:07PM +0100, Ingo Oeser ([email protected]) wrote: > > > Hmm, so I should resurrect my user page table walker abstraction? > > > > > > There I would hand each page to a "recording" function, which > > > can drop the page from the collection or coalesce it in the collector > > > if your scatter gather implementation allows it. > > > > It depends on where performance growth is stopped. > > From the first glance it does not look like find_extend_vma(), > > probably follow_page() fault and thus __handle_mm_fault(). > > I can not say actually, but if it is true and performance growth is > > stopped due to increased number of faults and it's processing, > > your approach will hit this problem too, doesn't it? > > My approach reduced the number of loops performed and number > of memory needed at the expense of doing more work in the main > loop of get_user_pages. > > This was mitigated for the common case of getting just one page by > providing a get_one_user_page() function. > > The whole problem, why we need such multiple loops is that we have > no common container object for "IO vector + additional data". > > So we always do a loop working over the vector returned by > get_user_pages() all the time. The bigger that vector, > the bigger the impact. > > Maybe sth. as simple as providing get_user_pages() with some offset_of > and container_of hackery will work these days without the disadvantages > my old get_user_pages() work had. > > The idea is, that you'll provide a vector (like arguments to calloc) and two > offsets: One for the page to store within the offset and one for the vma > to store. > > If the offset has a special value (e.g MAX_LONG) you don't store there at all. You still need to find VMA in one loop, and run through it's(mm_structu) pages in second loop. > But if the performance problem really is get_user_pages() itself > (and not its callers), then my approach won't help at all. It looks so. My test pseudocode is following: fget_light(); igrab(); kzalloc(number_of_pages * sizeof(void *)); get_user_pages(number_of_pages); ... undo ... I've attached two graphs of performance with and without get_user_pages(), it is get_user_pages.png and kmalloc.png. Vertical axis is number of Mbytes per second thrown through above code, horizontal one is number of pages in each run. > Regards > > Ingo Oeser -- Evgeniy Polyakov
Attachment:
get_user_pages.png
Description: PNG image
Attachment:
kmalloc.png
Description: PNG image
- References:
- [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
- From: Chris Leech <[email protected]>
- Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
- From: Ingo Oeser <[email protected]>
- Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
- From: Evgeniy Polyakov <[email protected]>
- Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
- From: Ingo Oeser <[email protected]>
- [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
- Prev by Date: Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
- Next by Date: 2.6.16-rc5-mm3
- Previous by thread: Re: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
- Next by thread: [PATCH 0/8] Intel I/O Acceleration Technology (I/OAT)
- Index(es):