Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hugh Dickins wrote:
David (clear_refs_pte_range) is only using ptep_test_and_clear_young,
though he did change the ptep_test_and_clear_dirty definition to be
consistent with it.  old/young is never so serious as clean/dirty, so
it may be that there's very little problem with what's in there now;
it just becomes a shame if the wrong decision gets made too often e.g.
if the misflushing is such that his clear_youngs never really take
effect.  I simply cannot tell whether or not that's the case myself.

If the logic is too difficult to reason about without entering ganda bherundasana, then it is simpler to just drop the _defer suffix. Old / young actually is just as serious, not because A-bit is critical (although misvirtualizing it will do strange things to the page clocking), but because the entire PTE update is delayed - and thus an intervening PTE update which remaps the physical page can come in before the A-bit update is flushed. The A-bit update carries the PPN of the original physical page, destroying the remap.

But once the pte_update_defers get moved away from their flush_tlb_pages,
as is the case now, it feels like we're on thin ice.

Yes, that is why I think I did the original hack of having #defines without implementations - to ensure the flush stayed coupled with the use of pte_update_defer.

Actually, I don't really get pte_update_defer at all: I can understand
wanting to defer a call down, but it appears to be a call down to say
we're deferring the shadow update?  Why not just do it with pte_update?
I'd be happier without it until this could be restructured more safely
(but my incomprehension is not necessarily the best guiding principle).

So pte_update does the update immediately; it is always safe. For select cases, where there is an immediate pte_update + flush combination, it is beneficial to combine the two into one hypercall, saving the several thousand cycle round trip into the hypervisor from happening twice. This is why pte_update_defer was used close to the flushes. I thought David was just defining the ptep_test_and_clear_xxx functions properly; I didn't realize he was actually using them to decouple the flush across a spinlock or logically complex code region. If that is the case, I agree pte_update instead of defer is absolutely the right thing to do.

I'm a little worried about the TLB consistency here, on real hardware. If you drop the spinlock, you could end up preempted. If the preemption runs a kernel thread, which accesses the same memory for which you just modified the page table mappings, the TLB could still have a stale entry. So, a clear of a dirty bit might happen in a writeback thread, but get preempted before the flush, then the hardware TLB still has a dirty entry. This means writes through the same virtual address will skip the A/D bit update in the pte. Now, for regular user memory access, that might not be a problem, but if it is the kernel touching user memory through kmap, the kernel virtual address might well .. I'm off in the weeds. But still slightly concerned about leaving that TLB entry stale across a possible preemption.

As an aside, in the tree I'm looking at, I don't see any users of ptep_clear_flush_dirty at all.. doesn't that mean writeback which undirties pages will leave the PTEs with latent dirty bits, causing them to get written back again when the process exits? Or does rmap take care of this. Ahh, I just discovered page_mkclean_one. Never mind. But still no users of ptep_clear_flush_dirty.

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux