Re: [Fastboot] [PATCH 03/03] kexec: Avoid overwriting the current pgd (V2, x86_64)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/27/06, Vivek Goyal <[email protected]> wrote:
On Fri, May 26, 2006 at 07:42:53PM +0900, Magnus Damm wrote:
> On 5/26/06, Vivek Goyal <[email protected]> wrote:
> >On Thu, May 25, 2006 at 05:26:56PM +0900, Magnus Damm wrote:
> >> So, to answer your question regarding two page table copies. You may be
> >> right that it can be made work with just one page table, but I feel my
> >> two table approach is nice because it will always work - regardless of
> >> the memory map used.
> >
> >So you seem to be suggesting that code can be made to work (even with Xen)
> >with single identity mapped page table as used currently but it would fail
> >in certain circumstances (different memory map used). Can you please
> >explain
> >a little more how?
>
> Sorry about the delay, your question needed some thinking.
>
> I do not think that vanilla kexec x86_64 has any memory map related
> problems, apart for the page table overwriting.

So we are trying to solve two problems here.

- Page table overwriting during kexec.
- Creating framework in advance to ensure kexec compatibility with Xen.

Correct. I'm actually supposed to only work on porting kexec to Linux
under Xen, but the page table overwriting issue annoyed me so I
thought that I would try to solve both problems at once.

Definitely first one need to be solved now. But given the fact Xen code
is not mainline yet, I feel that second one should be solved once
Xen code is mainline.

That makes sense. My goal here is trying to write Xen-friendly code
which does not give too much penalty when used without Xen and will
not generate too much many conflicts when merging with Xen.

> And the page table
> overwriting is not a bug that will make kdump fail, it just makes the
> memory image less accurate. I do however think the accuracy is quite
> important given that kdump mainly is used for memory image collection.
>
> The main reason for using two sets of page tables is simplicity. The
> page_table_a code for allocation and page table setup is more or less
> identical for i386 and x86_64. The code was written to be generic, but
> during the testing I realized that the architecture-specific header
> files defined things differently so I needed to add some
> architecture-specific macros as workarounds. It should be possible to
> share the code in a common file.
>
> Simon and I have tried to make the Xen port of kexec as simple as
> possible. One design decision that I know Eric dislikes is the array
> of pages for page_table_a. The reason behind this array is simplicity,
> especially for our Xen port.
>
> Our Xen port makes the dom0 kernel responsible for allocating pages.
> These pages are then used by the hypervisor. Some of these pages are
> page_table_a pages, and these pages are overwritten by the hypervisor
> with mappings that fit the memory map used by Xen. If we would pass
> the root pointer instead of an array of pages then the hypervisor
> would have to be extended to include code to parse page tables,
> extract pages and then fill in the new page table. That would be all
> but simple.
>

Again I am trying to understand the need for two page tables. So your
concern is that if current user space/kernel space memory split changes
in x86_64, it will break down the kexec and that's why the need of a
page_table_a which will help jump to identity mapped address in control
page and then you will switch to page_table_b which will help copying the
pages to destination.

If we omit the Xen case I'm pretty sure it can be done with just one
page table. No concerns about memory split on x86_64.

I don't see the problem with two page tables though. The two page
table approach makes the x86_64 port very similar to i386 and it makes
it possible to share code. And on top of that it will make the Xen
porting work much much simpler.

Can't we create an additional entry in identity mapped page table
(page_table_b) to map the control page at the same virtual address
as current page table and then jump to control page (using virtual address)
and then swith to identity mapped page table (page_table_b).

This would make sure that existing page tables are not overwritten as
well there is no dependency on user space/kernel space memory split being
used by the current page tables.

I hope I understood the problem right. :-)

Your reasoning seems right to me. =)
An additional entry in page_table_b will most likely do.

But, I think that the mix of large size and 4k pages (page_table_b is
mapped with large pages, and page_table_a with 4k) might introduce
some corner cases. I'm thinking about the case when the virtual
address for the control page (the only 4k mapping) happens to be in
the identity mapping range. In that case you will have to break up the
large page mapping and add several 4k mappings. Breaking up the large
page will require more pages to be allocated for the page table.

And if we build the Xen port on top of that (without an array of
pages) we will have to parse that mess in the hypervisor. And Xen
might have a different memory map which leads to a page table that has
a different virtual address for the 4k mapping. And this might
introduce a need to allocate extra pages from within the hypervisor,
which is something I definitely want to avoid.

To summarize, the single page table solution will work for most cases,
but it might be complicated if we happen to have the control page
mapped at a low address in the kernel/hypervisor. Mapping the page at
a low address forces us to allocate more pages for the page table. In
other words, the number of pages required to build the page table
varies depending on the virtual address where the control page is
mapped in the kernel/hypervisor.

The problematic case above might be very unlikely to happen with the
current x86_64 memory map, but I still think that the two page table
solution is a small price to pay for something that is guaranteed to
work in all cases.

Thank you for your comments!

/ magnus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux