Re: 2.6.14-rc1-git-now still dying in mm/slab - this time line 1849

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Andrew Morton wrote:
Christoph Lameter <[email protected]> wrote:

On Mon, 19 Sep 2005, Andrew Morton wrote:


	list_for_each(walk, &cache_chain) {
		kmem_cache_t *searchp;
		struct list_head* p;
		int tofree;
		struct slab *slabp;

		searchp = list_entry(walk, kmem_cache_t, next);

		if (searchp->flags & SLAB_NO_REAP)
			goto next;

		check_irq_on();

		l3 = searchp->nodelists[numa_node_id()];
		if (l3->alien)
			drain_alien_cache(searchp, l3);
->preempt here
		spin_lock_irq(&l3->list_lock);

		drain_array_locked(searchp, ac_data(searchp), 0,
				numa_node_id());
->oops, wrong node.

This is called from keventd which exists per processor. Hmmm... This looks as if it can change processors after all


Well no, it would be a big bug if a keventd thread were to change CPUs.

It's OK to rely upon the pinnedness of keventd I guess - a comment would be
nice.


but the slab allocator depends on it running on the right processor. So does the page allocator. sigh. What is the point of having per processor workqueues if they do not stay on the assigned processor?


They do. I don't believe that preemption is the source of this BUG. (Petr, does CONFIG_PREEMPT=n fix it?)

No, it does not.  I've even added printks here and there to show node number,
and everything works as it should.  Maybe there are some problems with
numa_node_id() and migrating between processors when memory gets released,
I do not know.

Only thing I know that if I'll add WARN_ON below to the free_block(), it
triggers...

@free_block
  slabp = GET_PAGE_SLAB(virt_to_page(objp));
  nodeid = slabp->nodeid;
+  WARN_ON(nodeid != numa_node_id());             <<<<<
  l3 = cachep->nodelist[nodeid];
  list_del(&slabp->list);
  objnr = (objp - slabp->s_mem) / cachep->objsize;
  check_spinlock_acquired_node(cachep, nodeid);
  check_slabp(cachep, slabp);

... saying that keventd/0 tries to operate on
slab belonging to node#1, while having acquired lock for cachep belonging
to node #0.  Due to this check_spinlock_acquired_node(cachep, nodeid) fails
(check_spinlock_acquired_node(cachep, 0) would succeed).
								Petr

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux