Re: [RFC, patch] i386: vgetcpu(), take 2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday 21 June 2006 14:24, Chuck Ebbert wrote:
> In-Reply-To: <[email protected]>
> 
> On Wed, 21 Jun 2006 10:15:39 +0200, Ingo Molnar wrote:
> 
> > * Chuck Ebbert <[email protected]> wrote:
> > 
> > > Use a GDT entry's limit field to store per-cpu data for fast access 
> > > from userspace, and provide a vsyscall to access the current CPU 
> > > number stored there.
> > 
> > very nice idea! I thought of doing sys_get_cpu() too, but my idea was to 
> > use the scheduler to keep a writable [and permanently pinned, 
> > per-thread] VDSO data page uptodate with the current CPU# [and other 
> > interesting data]. Btw., do we know how fast LSL is on modern CPUs?
> 
> Now that the GDT is a full page for each CPU there's plenty of space
> for all kinds of per-cpu data, even if we waste 75% of it.  LSL seems
> pretty fast; I got 13 clocks for the whole lsl/jnz/and sequence on K8

My measurements show different - i get 60+ cycles on K8 and 150+ cycles
on P4. That is with a full vsyscall around it. However it is still
far better than CPUID, however slower than RDTSCP on those CPUs that support it.

I changed the CPUID fallback path to use LSL on x86-64

> and 21 clocks on PII.  Myabe you can test P4?
> 
> /* test how fast lsl/jnz/and runs.
>  */
> #define _GNU_SOURCE
> #include <stdio.h>
> #include <stdlib.h>
> 
> #define rdtscll(t)	asm volatile ("rdtsc" : "=A" (t))
> 
> #ifndef ITERS
> #define ITERS	1000000
> #endif
> 
> int main(int argc, char * const argv[])
> {
> 	unsigned long long tsc1, tsc2;
> 	int count, cpu, junk;
> 
> 	rdtscll(tsc1);
> 	asm (
> 		"	pushl %%ds		\n"
> 		"	popl %2			\n"
> 		"1:				\n"
> #ifdef DO_TEST
> 		"	lsl %2,%0		\n"
> 		"	jnz 2f			\n"
> 		"	and $0xff,%0		\n"
> #endif
> 		"	dec %1			\n"
> 		"	jnz 1b			\n"
> 		"2:				\n"
> 		: "=&r" (cpu), "=&r" (count), "=&r" (junk)
> 		: "1" (ITERS), "0" (-1)
> 	);
> 	rdtscll(tsc2);

Measuring this way is a bad idea because you get far too much 
noise from the RDTSCs. Usually you need to put a a few thousands entry 
loop inside the RDTSCP and devide the result by the loop count

-Andi

> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux