Subject: [PATCH] mm/mempolicy.c linux-2.6.12.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Andrea,

I believe that we are seeing a problem with one change from the patch

  2005/01/03 20:15:21-08:00 [email protected]
  [PATCH] mempolicy optimisation

in the mpol_free_shared_policy() routine in mm/mempolicy.c, where a
rb_erase() line was removed.

The corresponding portion of the original patch is shown below:

@@ -1086,11 +1084,11 @@
        while (next) {
                n = rb_entry(next, struct sp_node, nd);
                next = rb_next(&n->nd);
-               rb_erase(&n->nd, &p->root);
                mpol_free(n->policy);
                kmem_cache_free(sn_cache, n);
        }
        spin_unlock(&p->lock);
+       p->root = RB_ROOT;
 }


When we build a 2.6.11.4 debug kernel on a 2 cpu NUMA-enabled opteron
system, and run the following set of commands:

echo "1" > /tmp/numatest
numactl --length=0x4000 --shm /tmp/numatest --localalloc
numactl --length=0x2000 --offset=0 --shm /tmp/numatest --membind=0
numactl --length=0x2000 --offset=0x2000 --shm /tmp/numatest --membind=1
ipcs
ipcrm -M "the_key_value_of_this_shm_area"


On the ipcrm call above, the system will oops:

general protection fault: 0000 [1] PREEMPT SMP

Entering kdb (current=0xffff81008161a0f0, pid 3102) on processor 1 Oops: <NULL>
due to oops @ 0xffffffff802e51f3
     r15 = 0x0000000000000100      r14 = 0xffff81007d7a5470
     r13 = 0xffff810001eea460      r12 = 0xffff81007c4324f8
     rbp = 0xffff81007c4324f8      rbx = 0xffff81007c4324f8
     r11 = 0x0000000000000202      r10 = 0x00002aaaaabc3db0
      r9 = 0xffff81007c4324a8       r8 = 0x0000000000000000
     rax = 0x000000000000000f      rcx = 0x0000000000000000
     rdx = 0x0000000000000000      rsi = 0x000000000000006b
     rdi = 0x6b6b6b6b6b6b6b6b orig_rax = 0xffffffffffffffff
     rip = 0xffffffff802e51f3       cs = 0x0000000000000010
  eflags = 0x0000000000010202      rsp = 0xffff81007d06bc40
      ss = 0xffff81007d06a000 &regs = 0xffff81007d06bba8
[1]kdb> bt
Stack traceback for pid 3102
0xffff81008161a0f0     3102     2761  1    1   R  0xffff81008161a480 *ipcrm
RSP           RIP                Function (args)
0xffff81007d06bc40 0xffffffff802e51f3 rb_next+0xb (0xffff810001eea7d8, 0xffff810001eea518, 0xffff810001eea518, 0x0, 0xffff810001eea518)
0xffff81007d06bc58 0xffffffff80189c74 mpol_free_shared_policy+0x3a
0xffff81007d06bc88 0xffffffff8018d306 shmem_destroy_inode+0x1f
0xffff81007d06bc98 0xffffffff801ab0e8 destroy_inode+0x31 (0xffff81007d7a5484)
0xffff81007d06bca8 0xffffffff801ac5a6 generic_delete_inode+0x10c
0xffff81007d06bcc8 0xffffffff801ac796 iput+0x77 (0xffff8100815673c8)
0xffff81007d06bcd8 0xffffffff801a915c dput+0x1c0 (0xffff81007d3e8408, 0x10000, 0x0, 0x0, 0xffff81007d3e8408)
0xffff81007d06bcf8 0xffffffff80190a9a __fput+0x128 (0xffff81007d3e8408)
0xffff81007d06bd28 0xffffffff8019096d fput+0x14
0xffff81007d06bd38 0xffffffff802d1a8c shm_destroy+0x4d
0xffff81007d06bd48 0xffffffff802d26d9 sys_shmctl+0x689


The rdi = 0x6b6b6b6b6b6b6b6b above (and additional debugging that I did)
shows that we called rb_next(&n->nd) in the while loop and we got back
a rb_node that we already just deallocated via kmem_cache_free() in that
same while loop execution.

As a result, on the debug kernel, the already deallocated rb_node has
0x6b6b6b6b6b6b6b6b in its memory locations and we oops when we try to
use this value as a pointer.

When I put back the rb_erase() line, the above example test, and lots
of others like it, start working again w/out oops-ing.

I'm no rb tree expert, but it seems that we still need to have the
rb_erase() line in the while loop:

diff -u linux-2.6.12.3/mm/mempolicy.c new/mm/mempolicy.c
--- linux-2.6.12.3/mm/mempolicy.c	2005-07-15 17:18:57.000000000 -0400
+++ new/mm/mempolicy.c	2005-07-22 11:12:56.000000000 -0400
@@ -1104,6 +1104,7 @@
 	while (next) {
 		n = rb_entry(next, struct sp_node, nd);
 		next = rb_next(&n->nd);
+		rb_erase(&n->nd, &p->root);
 		mpol_free(n->policy);
 		kmem_cache_free(sn_cache, n);
 	}



Thank you for your time and considerations.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]
  Powered by Linux