Re: Huge pages and small pages. . .

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Helge Hafting wrote:
> John Richard Moser wrote:
> 
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>>
>> linux-os (Dick Johnson) wrote:
>>  
>>
>>> On Tue, 17 Jan 2006, John Richard Moser wrote:
>>>
>>>
>>> Is there anything in the kernel that shifts the physical pages for 1024
>>> physically allocated and contiguous virtual pages together physically
>>> and remaps them as one huge page?  This would probably work well for the
>>>
>>>
>>>   
>>>
>>>> A page is something that is defined by the CPU. Perhaps you mean
>>>> "order"? When acquiring pages for DMA, they need to be contiguous if
>>>> you are going to access more than one page at a time. Therefore, one
>>>> can attempt to get two or more pages, i.e., the order or pages.
>>>>     
>>>> Since the CPU uses virtual memory always, there is no advantage to
>>>> having contiguous pages. You just map anything that's free into
>>>> what looks like contiguous memory and away you go.
>>>>     
>>
>>
>> Well, pages are typically 4KiB seen by the MMU.  If you fault across
>> them, you need to have them cached in the TLB; if the TLB runs out of
>> room, you invalidate entries; then when you hit entries not in the TLB,
>> the TLB has to searhc for the page mapping in the PTE chain.
>>
>> There are 4MiB pages, called "huge pages," that if you clump 1024
>> contiguous 4KiB pages together and draw a PTE entry up for can correlate
>> to a single TLB entry.  In this way, there's no page faulting until you
>> cross boundaries spaced 4MiB apart from eachother, and you use 1 TLB
>> entry where you would normally use 1024.
>>
>>  
>>
>>> low end of the heap, until someone figures out a way to tell the system
>>> to free intermittent pages in a big mapping (if the heap has an
>>> allocation up high, it can have huge, unused areas that are allocated).
>>>
>>>
>>>   
>>>
>>>> The actual allocation only occurs when an access happens. You can
>>>> allocate all the virtual memory in the machine and never use any
>>>> of it. When you allocate memory, the kernel just marks a promised
>>>> page 'not present'. When you attempt to access it, a page-fault
>>>> occurs and the kernel tries to find a free page to map into your
>>>> address space.
>>>>     
>>
>>
>> Yes.  The heap manager brk()s up the heap to allocate more space, all
>> mapped to the zero page; then the application writes to these pages,
>> causing them to be COW'd to real memory.  They will stay forever
>> allocated until the highest pages of the heap are unused by the program,
>> in which case the heap manager brk()s down the heap and frees them to
>> the system.
>>
>> Currently the heap manager can't seem to tell the system that a page
>> somewhere in the middle of the heap isn't really needed anymore, and can
>> be freed and mapped back to the zero page to await COW again.  So in
>> effect, you'll eventually wind out with a heap that's 20, 50, 100, or
>> 200 megs wide and probably all actually mapped to real, usable memory;
>> at this point, you can probably replace most of those entries with huge
>> pages to save on TLB entries and page faults.
>>  
>>
> This would be a nice performance win _if_ the program is
> actually using all those pages regularly. And I mean _all_.
> 
> The idea fails if we get any memory pressure, and some
> of the little-used pages gets swapped out.  You can't swap
> out part of a huge page, so you either have to break it up,
> or suffer the possibly large performance loss of having
> one megabyte less of virtual memory.   (That'll be even worse
> if several processes do this, of course.)
> 
>> When the program would try to free up "pages" in a huge page, the kernel
>> would have to recognize that the application is working in terms of 4KiB
>> small pages, and take appropriate action shattering the huge page back
>> into 1024 small pages first.
>>  
>>
> To see whether this is worthwile, I suggest instrumenting a kernel
> to detect:
> (1) When a process gets opportunity to have a huge page
>      (I.e. it has a sufficiently large and aligned memory block,
>       none of it is swapped, and rearranging into contigous
>       physical memory is possible.)
> (2) When a huge page would need to be broken up.
>      (I.e. parts of a previously identified huge page gets swapped
>       out, deallocated, shared, memprotected and so on.)
> 
> 
> Both of these detectors will be necessary anyway if automatic huge
> pages gets deployed.  In the meantime, you can check to see
> if such occations arise and for how long the huge pages remain
> viable. I believe the swapper will kill them fast once you have
> any memory pressure, and that fragmentation normally will prevent
> such pages from forming.  But I could be wrong of course.
> 

You're probably right.  On a system with optimum memory, this would be
optimal; aside from that it could get painful.

> For this to be fesible, the reduction in TLB misses would have to
> outweigh the initial cost of copying all that memory into a contigous
> region, the cost of shattering the mapping once the swapper gets
> aggressive, and of course the ongoing cost of identifying mergeable blocks.
> 
> Helge Hafting
> 

- --
All content of all messages exchanged herein are left in the
Public Domain, unless otherwise explicitly stated.

    Creative brains are a valuable, limited resource. They shouldn't be
    wasted on re-inventing the wheel when there are so many fascinating
    new problems waiting out there.
                                                 -- Eric Steven Raymond
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFDzpLjhDd4aOud5P8RAmrAAJ0bTjc/SIrTSUECLIIEG8xCNjVnBACdF/N3
Cl/3+2m2nNm7IsGhJcFsWUs=
=Vrrl
-----END PGP SIGNATURE-----
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- Huge pages and small pages. . .
  - From: John Richard Moser <nigelenki@comcast.net>
- Re: Huge pages and small pages. . .
  - From: "linux-os \(Dick Johnson\)" <linux-os@analogic.com>
- Re: Huge pages and small pages. . .
  - From: John Richard Moser <nigelenki@comcast.net>
- Re: Huge pages and small pages. . .
  - From: Helge Hafting <helge.hafting@aitel.hist.no>

Prev by Date: Re: [PATCH] e1000 C style badness
Next by Date: Re: [RFC] Moving snd-bt87x and btaudio to drivers/media
Previous by thread: Re: Huge pages and small pages. . .
Next by thread: Vaiggra and Cailiis Offfers f5O1uP
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]