* Andi Kleen ([email protected]) wrote:
>
> > Ewwwwwwwwwww.... you plan to run this in SMP ? So you actually go byte
> > by byte changing pieces of instructions non atomically and doing
> > non-Intel's errata friendly XMC. You are really looking for trouble
> > there :) Two distinct errors can occur:
>
> In this case it is ok because this only happens when transitioning
> from 1 CPU to 2 CPUs or vice versa and in both cases the other CPUs
> are essentially stopped.
>
I agree that it's ok with SMP, but another problem arises: it's not only
a matter of being protected from SMP access, but also a matter of
reentrancy wrt interrupt handlers.
i.e.: if, as we are patching nops non atomically, we have a non maskable
interrupt coming which calls get_cycles_sync() which uses the
alternatives for cpuid when the NOP instructions are not in a correct
state, we can end up doing an illegal instruction.
I see that IRQs are disabled in alternative_instructions(), but it does
not protect against NMIs, which could come at a very inappropriate
moment. MCE and SMIs would potentially cause the same kind of trouble.
So unless you can guarantee that any code from NMI handler won't call
basic things such as get_cycles() (nor MCE, nor SMIs), you can't insure
it won't execute an illegal instruction. Also, the option of temporarily
disabling the NMI for the duration of the update simply adds unwanted
latency to the NMI handler which could be unacceptable in some setups.
Another potential problem comes from preemption: if a thread is
preempted in the middle of the instructions you are modifying, let's say
we originally have 4 * 1 byte instructions that we replace with 1 * 4
byte instruction, a thread could iret in the middle of the new
instruction when it will be scheduled again, thus executing an illegal
instruction. This problem could be overcomed by putting the threads in
the freezer. See the djprobe project for details about this.
In the end, I think the safest solution would be to limit ourselves to
one of these 2 solutions:
- If the update can be done atomically and leaves the site in a coherent
state after each atomic modification, while keeping the same
instruction size, it could be done on the spot.
- If not, care should be taken to first do a copy of the code to modify
into a "bypass", making sure that instructions can be relocated, so
that any context that would try to execute the site during the
modification would execute a breakpoint which would execute the bypass
while we modify the instructions. Care should be taken to make sure
that threads are in the freezer while we do this. This is basically
what djprobes does. You case is only a bit simpler because you don't
worry about SMP.
> All the other manipulations currently are single byte.
>
For breakpoint, yes, this one is easy.
> I suppose for your immediate value patches something stronger is needed,
> but we can worry about that post .23.
>
> > What I don't like about this particular implementation is that it does
> > not support "poking" more than 1 byte. In order to support this, you
> > would have to deal with the case where the address range spans over more
> > than one page.
>
> I considered it, but the function would have been at least twice as big
> to handle all the corner cases. And for the current callers it's all fine.
>
> > Also, doing the copy in the same interface seems a bit awkward.
>
> Splitting it would also seem quite awkward.
>
Because of the tricks that must be done to do code modification on a
live system (as explained above), I don't think it makes sense to
provide a falsely-safe infrastructure that would allow code modification
in a non-atomic manner.
> >
> > I would much prefer something like:
> >
> > void *map_shadow_write(void *addr, size_t len);
> > (returns a pointer to the shadow writable pages, at the same page offset
> > as "addr")
> >
> > int unmap_shadow_write(void *shadow_addr, size_t len);
> > (unmap the shadow pages)
> >
> > Then, the in-kernel user is free to modify their pages as they like.
> > Since we cannot foresee each modification pattern, I think that leaving
> > this kind of flexibility is useful.
>
> You could as well call vmap directly then; it's not that much
> more complicated. I don't really see much value in complicating
> it right now.
>
Right about the direct vmap call, it is quite simple and elegant, I
guess I'll use it in my next version of kernel text lock.
Mathieu
> -Andi
>
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]