Re: light weight counters: race free through local_t?

Christoph Lameter wrote:

Could you do a clock cycle comparision of an

atomic_inc(__get_per_cpu(var))
(the fallback of local_t on ia64)

vs.

local_irq_save(flags)
__get_per_cpu(var)++
local_irq_restore(flags)
(ZVC like implementation)

vs.

get_per_cpu(var)++
put_cpu()
(current light weight counters)


The only thing I have at hand is a small test for the 1st case:


#include <stdio.h>
#include <asm/atomic.h>
#define GET_ITC()                                               \
({                                                              \
       unsigned long ia64_intri_res;                           \
                                                               \
       asm volatile ("mov %0=ar.itc" : "=r"(ia64_intri_res));  \
       ia64_intri_res;                                         \
})

#define N       (1000 * 1000 * 100L)

atomic_t data;

main(int c, char *v[])
{
       unsigned long cycles;
       int i;

       cycles = GET_ITC();
       for (i = 0; i < N; i++)
               ia64_fetchadd4_rel(&data, 1);
       cycles = GET_ITC() - cycles;
       printf("%ld %d\n", cycles / N, atomic_read(&data));
}


It gives 11 clock cycles.
(The loop organizing instructions are "absorbed".)

"atomic_inc(__get_per_cpu(var))" compiles into:

	mov	rx = 0xffffffffffffxxxx	// &__get_per_cpu(var)
	;;
	fetchadd4.rel ry = [rx], 1

It _should_ take 11 clock cycles, too. (Assuming it is in L2.)

For the 2nd case:
With a bit of modification, I can measure what
"__get_per_cpu(var)++" costs: 7 or 10 clock cycles, depending on
if the chance to find the counter in L1 is 100% or 0%:


int data;

static inline void store(int *addr, int data){
       asm volatile ("st4 [%1] = %0" :: "r"(data), "r"(addr) : "memory");
}

static inline int load_nt1(int *addr)
{
       int tmp;
       asm volatile ("ld4.nt1 %0=[%1]" : "=r"(tmp) : "r" (addr));
       return tmp;
}

main(int c, char *v[])
{
       unsigned long cycles;
       int i, d;

       cycles = GET_ITC();
       for (i = 0; i < N; i++)
               // Avoid optimizing out the "st4"
               store(&data, data + 1);
       cycles = GET_ITC() - cycles;
       printf("%ld %d\n", cycles / N, data);
       cycles = GET_ITC();
       for (i = 0; i < N; i++){
               // Do not use L1
               d = load_nt1(&data);
               store(&data, d + 1);
       }
       cycles = GET_ITC() - cycles;
       printf("%ld %d\n", cycles / N, data);
}


"local_irq_save(flags)" compiles into:

	mov rx = psr ;;		// 13 clock cycles
	rsm 0x4000 ;;		// 5 clock cycles

"local_irq_restore(flags)" compiles into (at least):

	ssm 0x4000		// 5 clock cycles

For the 3dr case:
If CONFIG_PREEMPT, then you need to add 2 * 7 clock cycles
for inc_preempt_count() / dec_preempt_count() + some more
for preempt_check_resched().

My conclusion: let's stick to atomic counters.

Regards,

Zoltan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: light weight counters: race free through local_t?
  - From: Christoph Lameter <[email protected]>
- Re: light weight counters: race free through local_t?
  - From: Christoph Lameter <[email protected]>

References:
- light weight counters: race free through local_t?
  - From: Christoph Lameter <[email protected]>
- Re: light weight counters: race free through local_t?
  - From: Zoltan Menyhart <[email protected]>
- Re: light weight counters: race free through local_t?
  - From: Christoph Lameter <[email protected]>

Prev by Date: Network drivers - porting to 2.6 issues
Next by Date: [PATCH] zone handle unaligned zone boundaries
Previous by thread: Re: light weight counters: race free through local_t?
Next by thread: Re: light weight counters: race free through local_t?
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]