Re: [kvm-devel] [PATCH 3/3] virtio PCI device

Anthony Liguori wrote:

Avi Kivity wrote:
Anthony Liguori wrote:
Avi Kivity wrote:
Anthony Liguori wrote:
This is a PCI device that implements a transport for virtio. Itallows virtio
devices to be used by QEMU based VMMs like KVM or Xen.

+
+/* the notify function used when creating a virt queue */
+static void vp_notify(struct virtqueue *vq)
+{
+    struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev);
+    struct virtio_pci_vq_info *info = vq->priv;
+
+ /* we write the queue's selector into the notificationregister to
+     * signal the other end */
+ iowrite16(info->queue_index, vp_dev->ioaddr +VIRTIO_PCI_QUEUE_NOTIFY);
+}
This means we can't kick multiple queues with one exit.
There is no interface in virtio currently to batch multiple queuenotifications so the only way one could do this AFAICT is to use atimer to delay the notifications. Were you thinking of something else?
No.  We can change virtio though, so let's have a flexible ABI.
Well please propose the virtio API first and then I'll adjust the PCIABI. I don't want to build things into the ABI that we never actuallyend up using in virtio :-)


Move ->kick() to virtio_driver.

I believe Xen networking uses the same event channel for both rx and tx,so in effect they're using this model. Long time since I looked though,

I'd also like to see a hypercall-capable version of this (but thatcan wait).
That can be a different device.
That means the user has to select which device to expose. Withfeature bits, the hypervisor advertises both pio and hypercalls, theguest picks whatever it wants.
I was thinking more along the lines that a hypercall-based device wouldcertainly be implemented in-kernel whereas the current device isnaturally implemented in userspace. We can simply use a differentdevice for in-kernel drivers than for userspace drivers.

Where the device is implemented is an implementation detail that shouldbe hidden from the guest, isn't that one of the strengths ofvirtualization? Two examples: a file-based block device implemented inqemu gives you fancy file formats with encryption and compression, whilethe same device implemented in the kernel gives you a low-overhead pathdirectly to a zillion-disk SAN volume. Or a user-level network devicecapable of running with the slirp stack and no permissions vs. thekernel device running copyless most of the time and using a dma enginefor the rest but requiring you to be good friends with the admin.

The user should expect zero reconfigurations moving a VM from one modelto the other.

There's nopoint at all in doing a hypercall based userspace device IMHO.

We abstract this away by having a "channel signalled" API (both at thekernel for kernel devices and as a kvm.h exit reason / libkvm callback.


Again, somewhat like Xen's event channels, though asymmetric.

I don't think so. A vmexit is required to lower the IRQ line. Itmay be possible to do something clever like set a shared memory valuethat's checked on every vmexit. I think it's very unlikely that it'sworth it though.
Why so unlikely?  Not all workloads will have good batching.
It's pretty invasive. I think a more paravirt device that expected anedge triggered interrupt would be a better solution for those types ofdevices.

I was thinking it could be useful mostly in the context of a paravirtirqchip, where we can lower the cost of level-triggered interrupts.

+
+    /* Select the queue we're interested in */
+    iowrite16(index, vp_dev->ioaddr + VIRTIO_PCI_QUEUE_SEL);
I would really like to see this implemented as pci config space,with no tricks like multiplexing several virtqueues on one register.Something like the PCI BARs where you have all the register numbersallocated statically to queues.
My first implementation did that. I switched to using a selectorbecause it reduces the amount of PCI config space used and does notlimit the number of queues defined by the ABI as much.
But... it's tricky, and it's nonstandard. With pci config, you can dolive migration by shipping the pci config space to the other side.With the special iospace, you need to encode/decode it.
None of the PCI devices currently work like that in QEMU. It would bevery hard to make a device that worked this way because since the orderin which values are written matter a whole lot. For instance, if youwrote the status register before the queue information, the driver couldget into a funky state.


I assume you're talking about restore?  Isn't that atomic?

We'll still need save/restore routines for virtio devices. I don'treally see this as a problem since we do this for every other device.


Yeah.

Not much of an argument, I know.
wrt. number of queues, 8 queues will consume 32 bytes of pci space ifall you store is the ring pfn.
You also at least need a num argument which takes you to 48 or 64depending on whether you care about strange formatting. 8 queues maynot be enough either. Eric and I have discussed whether the 9p virtiodevice should support multiple mounts per-virtio device and if so,whether each one should have it's own queue. Any devices that supportsthis sort of multiplexing will very quickly start using a lot of queues.

Make it appear as a pci function? (though my feeling is that multiplemounts should be different devices; we can then hotplug mountpoints).

I think most types of hardware have some notion of a selector or mode.Take a look at the LSI adapter or even VGA.


True.  They aren't fun to use, though.

--
Any sufficiently difficult bug is indistinguishable from a feature.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  - From: Anthony Liguori <[email protected]>
- Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  - From: Zachary Amsden <[email protected]>

References:
- [PATCH 0/3] virtio PCI driver
  - From: Anthony Liguori <[email protected]>
- [PATCH 1/3] Export vring functions for modules to use
  - From: Anthony Liguori <[email protected]>
- [PATCH 2/3] Put the virtio under the virtualization menu
  - From: Anthony Liguori <[email protected]>
- [PATCH 3/3] virtio PCI device
  - From: Anthony Liguori <[email protected]>
- Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  - From: Avi Kivity <[email protected]>
- Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  - From: Anthony Liguori <[email protected]>
- Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  - From: Avi Kivity <[email protected]>
- Re: [kvm-devel] [PATCH 3/3] virtio PCI device
  - From: Anthony Liguori <[email protected]>

Prev by Date: Re: [PATCH 07/18] x86 vDSO: vdso32 build
Next by Date: [PATCHv5 5/5] FD_CLOEXEC support for eventfd, signalfd, timerfd
Previous by thread: Re: [kvm-devel] [PATCH 3/3] virtio PCI device
Next by thread: Re: [kvm-devel] [PATCH 3/3] virtio PCI device
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]