Let's put it in another way:
Do the statistics need to be absolutely precise?
I guess they do not.
By the time you can display them, they have already been changed.

Let's take an example:

zone_statistics(struct zonelist *zonelist, struct zone *z)
//	local_irq_save(flags);		// No IRQ lock out
	cpu = smp_processor_id();	// Can become another CPU
	p = &z->pageset[cpu];		// Can count for someone else
	if (pg == orig) {
		stat_incr(&z->pageset[cpu].numa_hit);	// Unsafe
	} else {
//	local_irq_restore(flags);

Where "stat_incr()" is arch. dependent and possibly unsafe routine.

For IA64:

// Unsafe statistics
static inline void stat_incr(int *addr){
       int tmp;

	// Obtain immediately the cache line exclusivity, do not touch L1
       asm volatile ("ld4.bias.nt1 %0=[%1]" : "=r"(tmp) : "r" (addr));
       asm volatile ("st4 [%1] = %0" :: "r"(tmp), "r"(addr) : "memory");

It takes 10 clock cycles.


