Re: A proposal - binary

Chris Wright wrote:

* Greg KH ([email protected]) wrote:

Who said that? Please smack them on the head with a broom. We are allactively working on implementing Rusty's paravirt-ops proposal. Itmakes the API vs ABI discussion moot, as it allow for both.
So everyone is still skirting the issue, oh great :)


No, we are working closely together on Rusty's paravirt ops proposal.
Given the number of questions I've fielded in the last 24 hrs, I really
don't think people understand this.

We are actively developing paravirt ops, we have a patch series that
begins to implement it (although it's still in it's nascent stage).  If
anybody is interested in our work it is done in public.  The working
tree is here: http://ozlabs.org/~rusty/paravirt/ (mercurial patchqueue,
just be forewarned that it's still quite early to be playing with it,
doesn't do much yet).  We are using the virtualization mailing list for
discussions https://lists.osdl.org/mailman/listinfo/virtualization if
you are interested.

Zach (please correct me if I'm wrong here), is working on plugging the
VMI into the paravirt_ops interface.  So his discussion of binary
interface issues is as a consumer of the paravirt_ops interface.

To be completely clear, I am creating a set of paravirt_ops for ESX.This set of paravirt ops will still go through a binary indirectionlayer. Hence, it is important for me to educate everyone on that layerand find out the opinions people have on what an acceptable license /source policy is for that layer. We need the layer for exactly the samereason the vsyscall page is important. We use it to indirecthypervisor calls so that they can be future compatible, instead offorcing a particular hypervisor interface. When running on Intel vs.AMD hardware, that interface may be different. When running inside HVMhardware, VT or Pacifica, that interface _will_ be different. We mustallow for the possibility of alternative implementations. This layer isvery much like a PAL code layer that allows system level instructions tohave alternative implementations, and also, most importantly, means weare free to change the structural layout of information which is sharedbetween the hypervisor and the kernel. This shared information willgrow and need to change as it evolves over time. But we can't breakcompatibility with precompiled Linux kernels. So the layer needs to bethere and needs to be separate from the kernel, and I need to do that insuch a way that doesn't violate the licensing model of Linux or anyother operating system, while making sure that also doesn't conflictwith our corporate licensing policies. This is not a trivial problem.

So, in case it's not clear, we are all working together to get
paravirt_ops upstream.  My personal intention is to do everything I can
to help get things in shape to queue for 2.6.19 inclusion, and having
confusion over our direction does not help with that agressive timeline.

Paravirt_ops has long term benefits for the i386 (and x86_64)architectures. This is independent in fact of whether Xen and VMwarewant to use the same ABI to talk to the hypervisor or not. From mypoint of view, it is a cleaner way to implement the kernel backend toboth VMI and Xen, since it removes the requirement that we create anentirely new sub-architecture for each hypervisor. In the Xen case,they may want to run a dom-0 hypervisor which is compiled for an actualhardware sub-arch, such as Summit or ES7000. Using a sub-arch for thehypervisor means you would need some kind of nested sub-architecturesupport. This is ludicrous. Instead, what paravirt-ops promises longterm is a way to get rid of the sub-architecture layer altogether.Sub-arches like Voyager and Visual workstation have some strangeinitialization requirements, interrupt controllers, and SMP handling.Exactly the kind of thing which paravirt_ops is being designed toindirect for hypervisors. In the end, there is no reason it can't beexpanded to a more general purpose interface that removes therequirement to build separate kernels and maintain separatesub-architectures for each weird new tweak of i386. As i386 moves intomore embedded systems, I would expect to see these new sub-architecturesbegin to grow like a rash. It's ugly, and hard to maintain. I'vebroken SGI Visual workstation and Voyager support more than I'd care toadmit because it is really hard to compile and test all of thesedifferent variations of i386. In the end, it will finally be possibleto compile and run a single i386 kernel binary that is actually capableof running on the full set of supported hardware. This makes everydistro and maintainers life a lot simpler.

The same approach can be used on x86_64 for paravirtualization, but alsoto abstract out vendor differences between platforms. Opteron and EMT64hardware are quite different, and the plethora of non-standardmotherboards and uses have already intruded into the kernel. Having aclean interface to encapsulate these changes is also desirable here, andonce we've nailed down a final approach to achieving this for i386, itmakes sense to do x86_64 as well.

I'm now talking lightyears into the future, but when the i386 and x86_64trees merge together, this layer will be almost identical for the two,allowing sharing of tricky pieces of code, like the APIC and IO-APIC,NMI handling, system profiling, and power management. It the interfaceevolves in a nicely packaged and compartmentalized way from that, thenperhaps someday it can grow to become a true cross-architecture way tohandle machine abstraction and virtualization. Then you can compile asingle kernel which gets assembled to code proto-fragments that aredynamically linked together during the boot sequence, using across-machine translation unit that allows a single kernel to run onevery current and future processor architecture that mimics somecombined set of machine characteristics (N-tiered cache coloring,multiway hardware page tables, hypercubic interrupt routing, dynamicallymorphed GPUs, quantum hypervisor isolation). Of course, it will stillrequire a PCI bus.

So absolutely we should go in that direction now, and I'm fullycommitted to working on it. Which is why I wanted feedback on what wehave to do to make sure our ESX implementation is done in a way that isacceptable to the community. I too would like to push for an interfacein 2.6.19, and we can't have confusion on this issue be a last minutestopper.

Maybe someday Xen and VMware can share the same ABI interface and bothuse a VMI like layer. But that really is a separate and completelyorthogonal question. Paravirt-ops makes any approach to integratinghypervisor awareness into the kernel cleaner by providing an appropriateabstract interface for it.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: A proposal - binary
  - From: James Bottomley <[email protected]>
- Re: A proposal - binary
  - From: Andi Kleen <[email protected]>
- Re: A proposal - binary
  - From: Alan Cox <[email protected]>
- Re: A proposal - binary
  - From: Chris Wright <[email protected]>

References:
- A proposal - binary
  - From: Zachary Amsden <[email protected]>
- Re: A proposal - binary
  - From: Greg KH <[email protected]>
- Re: A proposal - binary
  - From: Zachary Amsden <[email protected]>
- Re: A proposal - binary
  - From: Greg KH <[email protected]>
- Re: A proposal - binary
  - From: Chris Wright <[email protected]>

Prev by Date: Re: Kernel Hangs, EIP is at scsi_decide_disposition
Next by Date: Re: Checksumming blocks? [was Re: the " 'official' point of view" expressed by kernelnewbies.org regarding reiser4 inclusion]
Previous by thread: Re: A proposal - binary
Next by thread: Re: A proposal - binary
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]