Re: [PATCH 0/6] Use two zonelists per node instead of multiple zonelists v11r2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2007-12-13 at 09:23 +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 12 Dec 2007 16:32:51 -0500
> Lee Schermerhorn <[email protected]> wrote:
> 
> > Just this afternoon, I hit a null pointer deref in
> > __mem_cgroup_remove_list() [called from mem_cgroup_uncharge() if I can
> > trust the stack trace] attempting to unmap a page for migration.  I'm
> > just starting to investigate this.
> > 
> > I'll replace the series I have [~V10] with V11r2 and continue testing in
> > anticipation of the day that we can get this into -mm.
> > 
> Hi, Lee-san.
> 
> Could you know what is the caller of page migration ?
> system call ? hot removal ? or some new thing ?

Kame-san:

I was testing with my out-of-tree automatic-lazy migration patches.  See
http://mirror.linux.org.au/pub/linux.conf.au/2007/video/talks/197.pdf
for an overview.  These patches arrange to unmap [remove ptes] for all
anon pages in a task's vmas with default/local policy [and with mapcount
below a tunable threshold] when the load balancer moves the task to a
different node.  Then, the pages will migrate on next touch/fault, if
they are misplaced relative to the policy--which is likely for many of
the pages, as the task is executing on a different node.  With respect
to file backed pages, automatic migration will only unmap the current
task's pte so that the task will take a fault on next touch and Nick
Piggin's pagecache replication patches will make, if necessary, or use
an existing local copy of the page.

The stack trace was:

try_to_unmap_one -> page_remove_rmap -> mem_cgroup_uncharge, with the
faulting instruction apparently in __mem_cgroup_remove_list().

> 
> Note: 2.6.24-rc4-mm1's cgroup/migration logic.
> 
> In 2.6.24-rc4-mm1, in page migration, mem_cgroup_prepare_migration() increments
> page_cgroup's refcnt before calling try_to_unmap(). This extra refcnt guarantees 
> the page_cgroup's refcnt will not drop to 0 in sequence of
> unmap_and_move() -> try_to_unmap() -> page_remove_rmap() -> mem_cgroup_unchage(). 

Yes, I've seen that code.  I'm working on page migration/replication in
the background, keeping the patches up to date and tested, so I haven't
had a lot of time to investigate.

I do have a heavy-handed instrumentation patch to try to trap any null
pointers or stale {page|mem}_cgroup pointers.  [I can send, if you're
interested.]  I restarted the stress test with that patch.  The test ran
quite a bit longer and then hit a different bug.  So, I have a race
somewhere.   If I can definitely pin it on the memory controller or an
iteraction of the memory controller with page migration/replication,
I'll let you know--and try to come up with a patch.  Otherwise, you
probably don't need to worry.

Lee
> 
> Thanks,
> -Kame
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux