Anthony Liguori wrote:
You miss my point I think. Using ioctls *requires* a thread
per-vcpu in userspace. This is unnecessary since you could simply
provide a char-device based read/write interface. You could then
multiplex events and poll.
Yes, ioctl()s require userspace threads, but that's okay, because
they're free for us, since we need a kernel thread for each vcpu.
On the other hand, a single device model thread polling the vcpus is
guaranteed to be on the wrong physical cpu for half of the time
(assuming 2 cpus and 2 vcpus), requiring IPIs and suspending a vcpu
in order to run.
And your previously proposed solution of having one big lock would do
the same thing except require additional round trips to the kernel :-)
No, with no contention locks stay in userspace. And if there is
contention, we fine-grain the locks.
Moreover, you could get clever and use mmap() to expose a ring queue
if you're really concerned about SMP.
Really though, it comes down to one simple thing: blocking ioctl()s
are a real ugly interface.
I don't think they can be termed "blocking".
Most (all?) blocking calls offload work to some other device, like a
disk or a network card, and sleep if that device has to do any
processing. They follow the same basic procedure:
- if data (or bufferspace) is available, read (or write) it
- otherwise, sleep
But in this case the "other device" is the processor, so the that model
doesn't fit very well, as it *forces* a context switch.
Moreover, we need to both read and write, which ioctls() allow, but
read()/write() require two system calls.
If for nothing else, you have to be able to run timers in userspace
and interrupt the kernel execution (to signal DMA completion for
instance). Even in the UP case, this gets ugly quickly.
The timers aren't pretty (we use signals), yes. But avoiding the
extra thread is critical for performance IMO.
We've had a lot of problems in QEMU with timers and kqemu. Forcing
the guest to return to userspace to allow periodic timers to run
(which may simply be the VGA refresh which the guest doesn't care
about) is at best a hack.
You can also have an additional thread to the periodic stuff.
Being able to poll an FD would make this so much nicer...
I've posted some patches on qemu-devel attempting to deal with these
issues (look for threads on optimizing char device performance). None
of them are very pretty.
Xen is different since you already have a context switch by going to
domain 0.
--
error compiling committee.c: too many arguments to function
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]