Re: Why preallocate pmd in x86 32-bit PAE?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, 16 Nov 2007, Jeremy Fitzhardinge wrote:
> >
> > IIRC, the present bit is ignored in the magic 4-entry PGD.  All entries 
> > have to be present.
> 
> Hm, do you recall what processors that might affect?  As far as I know,
> current processors will ignore non-present top-level entries.

Are you sure?

Anyway, this is not worth making a distinction for. Just pre-allocate all 
of them. There really is just 4 PGD entries, and it really *is* different 
from having a full three-level page table, and of the four PGD entries:

 - one is used for the kernel mapping (assuming the regular 1:3 layout)
 - AT LEAST two are required by user space anyway

so pre-allocating is never going to waste more than one page.

And you may feel that pre-allocating is a special case, but it's an 
*easier* special case than the one that you are apparently thinking about 
(which is to special-case according to CPU version).

So don't do it. Just preallocate for the magic 4-entry PGD. You can make 
the special case just be something like

	/* Preallocate for small PGD's */
	#if PTRS_PER_PGD == 4
		for (i = 0; i < USER_PTRS_PER_PGD; i++) {
			pmd_t *pmd = pmd_alloc();
			set_pgd(pgd+i, __pgd(PAGE_PRESENT | __pa(pmd));
		}	
	#endif

or similar. 

There is absolutely *zero* reason not to do this, and there is also zero 
reason to make this be a "32-bit vs 64-bit" issue. The code can be there 
in both, and the #if could even be all in C code (ie there may be reasons 
to prefer writing it as

	/* The old-style PAE PGD needs to be preallocated */
	if (USER_PTRS_PER_PGD <= 4) {
		...
	}

and the compiler should even compile it away entirely for all practical 
cases even without using the preprocessor.

> Anyway, we can point them not present to empty_zero_page, so testing the 
> present bit will still be sufficient to tell if we need to allocate a 
> new pmd, but if the hardware decides to follow the page reference 
> there's no harm done.  (Hm, unless the hardware decides it wants to set 
> A or D bits in empty_zero_page for some reason...)

x86 page table walking never sets A/D bits on non-present entries.

That said, there's still a huge difference. 

For "real" page table walking, you can always just insert entries without 
flushing the cache if those entries weren't there before (because the TLB 
is supposed to not cache negative entries). 

Again, because of the way the mahic 4-entry PGD works, that isn't true for 
it. It caches the entries regardless, so if you change it from non-present 
to present, you have to flush the TLB (well, "reload %cr3", which is the 
same thing in practice, although it's for a different *reason*).

> That just means we need to reload cr3 after populating the pgd with a
> new pmd, right?

BUT ONLY FOR THIS CASE!

And if you preallocate it, you make *that* special case go away. 

So you're going to have special cases regardless. Do the simple and 
really straightforward one, please! Nothing subtle.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux