Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}

On Fri, 13 Apr 2007, Zachary Amsden wrote:
> Hugh Dickins wrote:
> > Zach, while looking at your recent patches, I ran across the comment
> > on pte_update_defer, and where it was being used, and now think that
> > David's patch is actually incorrect.  Previously pte_update_defer
> > was being used where a flush_tlb_page followed immediately after
> > within the same macro; with David's patch, mm's clear_refs_pte_range
> > is calling ptep_test_and_clear_young (including pte_update_defer) on
> > several ptes, then unlocking the page table, and later flushing TLB.
> > That's exactly wrong for pte_update_defer, isn't it?
> >   
> 
> Ok, disregard most of my last e-mail.

Phew!  That's a lot quicker than digesting it ;)
But thanks for going to so much trouble.

> It is fine to decouple the flush from
> the update, as long as they stay close enough that you can reason they happen
> together.  I guess I hadn't seen the other parts of the patch which release
> the page table spinlock in between the two, and somehow missed it again when
> responding to the above as I got too excited explaining why the decoupling is
> ok.  It is not ok to release the spinlock when using shadow page tables on
> SMP.  There are some rather complex races that can result.  Here's one case:
> 
> CPU-0                    CPU-1
> -----------------------  ---------------------------
> test_and_clear_dirty(x)
> spin_unlock(ptl)
>                         write address mapped by X
>                         (harware updates dirty bit)
>                         spin_lock(ptl)
>                         set_pte_wrprotect(x)
>                         flush
> flush
> 
> Now, the write protected pte which maps a dirty page gets broken in two ways;
> it is unclear if dirty bit or entiry PTE from CPU-0 is deferred until flush,
> so either write protected PTE for modified page loses the dirty bit (BAD!), or
> write protected PTE loses both dirty and write protect bits (VERY BAD!).
> 
> To prevent this, we need a flush before dropping the spinlock.  If that gets
> too complicated, we can drop the defer logic and just use pte_update instead,
> which notifies the hypervisor immediately of the mapping change.

David (clear_refs_pte_range) is only using ptep_test_and_clear_young,
though he did change the ptep_test_and_clear_dirty definition to be
consistent with it.  old/young is never so serious as clean/dirty, so
it may be that there's very little problem with what's in there now;
it just becomes a shame if the wrong decision gets made too often e.g.
if the misflushing is such that his clear_youngs never really take
effect.  I simply cannot tell whether or not that's the case myself.

But once the pte_update_defers get moved away from their flush_tlb_pages,
as is the case now, it feels like we're on thin ice.

Actually, I don't really get pte_update_defer at all: I can understand
wanting to defer a call down, but it appears to be a call down to say
we're deferring the shadow update?  Why not just do it with pte_update?
I'd be happier without it until this could be restructured more safely
(but my incomprehension is not necessarily the best guiding principle).

Hugh
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}
  - From: Zachary Amsden <[email protected]>

References:
- Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}
  - From: Hugh Dickins <[email protected]>
- Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}
  - From: Zachary Amsden <[email protected]>

Prev by Date: [Announce] [patch] Modular Scheduler Core and Completely Fair Scheduler [CFS]
Next by Date: Re: Crash while mapping memory in pagetable_init() (Was: Re: .config)
Previous by thread: Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}
Next by thread: Re: [patch -mm] i386: use pte_update_defer in ptep_test_and_clear_{dirty,young}
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]