Re: 2.6.16-rc6-mm2 — Linux Kernel

On Sat, 10 Jun 2006, Michal Piotrowski wrote:
> [michal@ltg01-fedora linux-mm]$ cat ~/page_alloc.patch | patch -p1 --dry-run
> patching file mm/page_alloc.c
> Hunk #1 FAILED at 1583.
> Hunk #2 succeeded at 1604 (offset 1 line).
> 1 out of 3 hunks FAILED -- saving rejects to file mm/page_alloc.c.rej
> patching file include/linux/page-flags.h
> 
> PITA for people that aren't kernel hackers.

Sorry that patch was still against mm1. Here is a fixed up version that 
applies cleanly against mm2:

light weight counters: race free through local_t

One way of making the light weight counters race free for x86_64 and
i386 is to use local_t. With that those two platforms are fine.

However, the others fall back to atomic operations.

Maybe we could deal with that on per platform basis? Some platforms
may want to switch away from atomics to regular integers if preemption
is not configured. Most commercial Linux distros ship with preempt off.
So this would preserve the speed of light weight counters.

Some of the potential races with just plain integers are not that good.

F.e. if an integer is loaded via the per cpu area and then
we switch to another processor. At that point the per cpu area changes.
We may then increment the count from the old processor and store it
to the counter of the new processor. Ick.

Signed-off-by: Christoph Lameter <[email protected]>

Index: linux-2.6.17-rc6-mm2/mm/page_alloc.c
===================================================================
--- linux-2.6.17-rc6-mm2.orig/mm/page_alloc.c	2006-06-10 11:11:58.068665310 -0700
+++ linux-2.6.17-rc6-mm2/mm/page_alloc.c	2006-06-10 11:13:54.679609667 -0700
@@ -1583,7 +1583,7 @@ static void show_node(struct zone *zone)
 #endif
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
-DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
+DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{LOCAL_INIT(0)}};
 EXPORT_PER_CPU_SYMBOL(vm_event_states);
 
 static void sum_vm_events(unsigned long *ret, cpumask_t *cpumask)
@@ -1604,7 +1604,7 @@ static void sum_vm_events(unsigned long 
 
 
 		for (i=0; i< NR_VM_EVENT_ITEMS; i++)
-			ret[i] += this->event[i];
+			ret[i] += local_read(&this->event[i]);
 	}
 }
 
@@ -2882,7 +2882,7 @@ static void vm_events_fold_cpu(int cpu)
 
 	for (i = 0; i < NR_VM_EVENT_ITEMS; i++) {
 		count_vm_events(i, fold_state->event[i]);
-		fold_state->event[i] = 0;
+		local_set(fold_state->event[i], 0);
 	}
 }
 
Index: linux-2.6.17-rc6-mm2/include/linux/page-flags.h
===================================================================
--- linux-2.6.17-rc6-mm2.orig/include/linux/page-flags.h	2006-06-10 11:11:57.173212937 -0700
+++ linux-2.6.17-rc6-mm2/include/linux/page-flags.h	2006-06-10 11:14:23.596764624 -0700
@@ -8,7 +8,7 @@
 #include <linux/percpu.h>
 #include <linux/cache.h>
 #include <linux/types.h>
-
+#include <asm/local.h>
 #include <asm/pgtable.h>
 
 /*
@@ -108,10 +108,6 @@
 /*
  * Light weight per cpu counter implementation.
  *
- * Note that these can race. We do not bother to enable preemption
- * or care about interrupt races. All we care about is to have some
- * approximate count of events.
- *
  * Counters should only be incremented and no critical kernel component
  * should rely on the counter values.
  *
@@ -134,24 +130,24 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
 };
 
 struct vm_event_state {
-	unsigned long event[NR_VM_EVENT_ITEMS];
+	local_t event[NR_VM_EVENT_ITEMS];
 };
 
 DECLARE_PER_CPU(struct vm_event_state, vm_event_states);
 
 static inline unsigned long get_cpu_vm_events(enum vm_event_item item)
 {
-	return __get_cpu_var(vm_event_states).event[item];
+	return cpu_local_read(vm_event_states.event[item]);
 }
 
 static inline void count_vm_event(enum vm_event_item item)
 {
-	__get_cpu_var(vm_event_states).event[item]++;
+	cpu_local_inc(vm_event_states.event[item]);
 }
 
 static inline void count_vm_events(enum vm_event_item item, long delta)
 {
-	__get_cpu_var(vm_event_states).event[item] += delta;
+	cpu_local_add(delta, vm_event_states.event[item]);
 }
 
 extern void all_vm_events(unsigned long *);
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: 2.6.16-rc6-mm2
  - From: Ingo Molnar <[email protected]>
- Re: 2.6.16-rc6-mm2
  - From: "Michal Piotrowski" <[email protected]>

References:
- 2.6.16-rc6-mm2
  - From: Andrew Morton <[email protected]>
- Re: 2.6.16-rc6-mm2
  - From: "Michal Piotrowski" <[email protected]>
- Re: 2.6.16-rc6-mm2
  - From: Andrew Morton <[email protected]>
- Re: 2.6.16-rc6-mm2
  - From: Christoph Lameter <[email protected]>
- Re: 2.6.16-rc6-mm2
  - From: Christoph Lameter <[email protected]>
- Re: 2.6.16-rc6-mm2
  - From: Andrew Morton <[email protected]>
- Re: 2.6.16-rc6-mm2
  - From: Christoph Lameter <[email protected]>
- Re: 2.6.16-rc6-mm2
  - From: "Michal Piotrowski" <[email protected]>

Prev by Date: [PATCH][RFC] fix memory leak in rocketport rp_do_receive
Next by Date: Re: 2.6.16-rc6-mm2
Previous by thread: Re: 2.6.16-rc6-mm2
Next by thread: Re: 2.6.16-rc6-mm2
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]