On Sun, 2007-01-28 at 21:25 -0800, David Miller wrote:
> From: [email protected] (Eric W. Biederman)
> Date: Sun, 28 Jan 2007 22:18:59 -0700
>
> > Regardless of my opinion on the sanity of the hypervisor architects.
> > I have not seen anything that indicates it will be hard to support
> > the hypervisor doing everything or most of everything for us, so
> > I see no valid technical objection to it. Nor have I ever.
> >
> > So I have no problem with additional patches in that direction.
>
> Ok, that's great to hear.
>
> I know your bi-directional approach isn't exactly what Ben
> wants but he can support his machines with it. Maybe after
> some time we can agree to move from that more towards the
> totally abstracted scheme.
It can support my machines without HV with trivial changes I reckon: I
need an ops struct to indirect eric's 2 remaining arch hooks
(setup/teardown) but that can be done inline within asm-powerpc. I need
to double check of course and probably actually port the MPIC backend
and possibly go write the Cell Axon one while at it to verify everything
is allright, but the base design seems sound enough.
For the ones with HV (RTAS stuff), we still need to agree on how to
approach it. We can either:
Option 1
--------
Do a hook -above- Eric stuff, by having the toplevel APIs themselves be
arch hooks that can either go toward the RTAS implementation or toward
Eric's code. That is, eric code would define those (pick better names if
you are good at it):
pci_generic_enable_msi
pci_generic_disable_msi
pci_generic_enable_msix
pci_generic_disable_msix
pci_generic_save_msi_state
pci_generic_restore_msi_state
Then we can have asm-i386/msi.h & friends do something like
#define pci_enable_msi pci_generic_enable_msi
#define pci_disable_msi pci_generic_disable_msi
etc...
And we can have asm-powerpc/msi.h hook then via ppc_md:
static inline int pci_enable_msi(xxx...)
{
return ppc_md.pci_enable_msi(xxx...);
}
etc...
(ppc_md is our per-platform global hook structure filled at boot when we
discover on what machine type we are running on) so that pSeries can use
it's own RTAS callbacks, and others can just re-hook those to Eric's
code.
Option 2
--------
That is to make Eric's code itself cope with the HV case. I'm a bit at
loss right now as how precisely to do it. I need to spend more time
staring at the code after Eric latest patches rather than the patches
themselves I suppose :-) (Eric, they don't apply out of the box on
current git, they are against -mm ?).
Some of the main issues here, more/less following the order in which
Eric code calls things:
- The number of vectors for MSI-X is obtained from config space (at
least for sanity checking the requested argument). On RTAS, it should
come from an OF property (we are really not supposed to go read the
config space even if we can). I -suppose- we can survive for now with
just reading it, but we might well run into trouble with some "special"
devices shared accross partitions or if the IBM magic bridges themselves
ever start sending MSI-X on their own (unlikely but who knows...).
Michael's code handled that by having a callback ->check() do the sanity
checking of the nvec, and then just use the nvec passed in as an
argument once it's sane.
So for that I would propose adding an arch_check_msi(pdev, type, nvec)
or something like that. Note the biggest issue at this point anyway.
- The real big one: For MSI-X, Eric's code tries to "hide" the fact
that those are MSI-X by allocating the msi-x entry array, then iterating
through them calling arch_setup_msi_irq() for each of them.
For that to work for us, it would need to be different, possibly
pre-allocating the array, and having -one- call taking an array and a
nvec. That's one of the reasons why I liked Michael's approach as
instead of making MSI-X look like MSI, it made MSI look like MSI-X by
passing a 1 entry array in the MSI case. Both approaches can probably be
made to handle multiple MSIs if we ever want to handle them.
The same issue is present for teardown of course.
- We need HV hooks for suspend/resume at one point. Nothing urgent
there as our HV machines don't do suspend/resume just yet :-) But if we
ever implement something like suspend-to-disk, they'll definitely need
something as we are likely to get different vectors back from the
firmware so we need to re-map them to the same linux IRQ numbers.
I need to have a second look at Eric's code after I manage to find the
right combination of kernel for his patches to apply to check if I
missed anything important.
Cheers,
Ben.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]