Re: [PATCH] i386-pda UP optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ingo Molnar wrote:
> what point would there be in using it? It's not like the kernel could 
> make use of the thread keyword anytime soon (it would need /all/ 
> architectures to support it) ...

The plan was to implement the x86 arch-specific percpu stuff to use it,
since it allows gcc better optimisation opportunities.

>  and the kernel doesnt mind how the 
> current per_cpu() primitives are implemented, via assembly or via C. In 
> any case, it very much matters to see the precise cost of having the pda 
> selector value in %gs versus %fs.
>   

Hm, well, unfortunately for me, there is a small but distinct advantage
to using %fs rather than %gs (around 0-5ns per iteration).  The notable
exception being the "AMD-K6(tm) 3D+ Processor", where %gs is about 25%
(15ns) faster.

I'll revise the patches to use %fs and resubmit.

    J
"Genuine Intel(R) CPU           T2400  @ 1.83GHz" @1000Mhz (6,14,8):
ds=7b fs=0 gs=33 ldt=f gdt=3b CPUTIME 
   <none> with data selector: 0ns/iteration
   fs with data selector: 26ns/iteration
   gs with data selector: 30ns/iteration

   <none> with LDT selector: 0ns/iteration
   fs with LDT selector: 26ns/iteration
   gs with LDT selector: 26ns/iteration

   <none> with GDT selector: 0ns/iteration
   fs with GDT selector: 26ns/iteration
   gs with GDT selector: 30ns/iteration

"Intel(R) Pentium(R) 4 CPU 1.80GHz" @1817.9Mhz (15,2,4):
ds=7b fs=0 gs=33 ldt=f gdt=3b CPUTIME 
   <none> with data selector: 0ns/iteration
   fs with data selector: 33ns/iteration
   gs with data selector: 34ns/iteration

   <none> with LDT selector: 0ns/iteration
   fs with LDT selector: 43ns/iteration
   gs with LDT selector: 52ns/iteration

   <none> with GDT selector: 0ns/iteration
   fs with GDT selector: 33ns/iteration
   gs with GDT selector: 34ns/iteration

"Intel(R) Celeron(R) CPU 2.40GHz" @2394.47Mhz (15,2,9):
ds=7b fs=0 gs=33 ldt=f gdt=3b CPUTIME 
   <none> with data selector: 0ns/iteration
   fs with data selector: 20ns/iteration
   gs with data selector: 24ns/iteration

   <none> with LDT selector: 0ns/iteration
   fs with LDT selector: 21ns/iteration
   gs with LDT selector: 26ns/iteration

   <none> with GDT selector: 0ns/iteration
   fs with GDT selector: 21ns/iteration
   gs with GDT selector: 26ns/iteration

"Pentium 75 - 200" @166.206Mhz (5,2,12):
ds=7b fs=0 gs=33 ldt=f gdt=3b GTOD
   <none> with data selector: 1ns/iteration
   fs with data selector: 74ns/iteration
   gs with data selector: 75ns/iteration

   <none> with LDT selector: 1ns/iteration
   fs with LDT selector: 74ns/iteration
   gs with LDT selector: 75ns/iteration

   <none> with GDT selector: 1ns/iteration
   fs with GDT selector: 74ns/iteration
   gs with GDT selector: 74ns/iteration

"AMD-K6(tm) 3D+ Processor" @451.105Mhz (5,9,1):
ds=7b fs=0 gs=33 ldt=f gdt=3b GTOD
   <none> with data selector: 0ns/iteration
   fs with data selector: 59ns/iteration
   gs with data selector: 44ns/iteration

   <none> with LDT selector: 0ns/iteration
   fs with LDT selector: 59ns/iteration
   gs with LDT selector: 44ns/iteration

   <none> with GDT selector: 0ns/iteration
   fs with GDT selector: 59ns/iteration
   gs with GDT selector: 44ns/iteration

"AMD Athlon(tm) XP 3000+" @2162.74Mhz (6,10,0):
ds=7b fs=0 gs=33 ldt=f gdt=3b CPUTIME
   <none> with data selector: 0ns/iteration
   fs with data selector: 10ns/iteration
   gs with data selector: 11ns/iteration

   <none> with LDT selector: 0ns/iteration
   fs with LDT selector: 11ns/iteration
   gs with LDT selector: 11ns/iteration

   <none> with GDT selector: 0ns/iteration
   fs with GDT selector: 11ns/iteration
   gs with GDT selector: 11ns/iteration


"AMD Athlon(tm) 64 Processor 3500+" @2210.23Mhz (15,31,0):
ds=2b fs=0 gs=63 ldt=f gdt=6b GTOD
   <none> with data selector: 0ns/iteration
   fs with data selector: 11ns/iteration
   gs with data selector: 11ns/iteration

   <none> with LDT selector: 0ns/iteration
   fs with LDT selector: 10ns/iteration
   gs with LDT selector: 11ns/iteration

   <none> with GDT selector: 0ns/iteration
   fs with GDT selector: 10ns/iteration
   gs with GDT selector: 11ns/iteration

"Pentium III (Coppermine)" @700Mhz (6,8,6):
ds=7b fs=0 gs=33 ldt=f gdt=3b CPUTIME
   <none> with data selector: 0ns/iteration
   fs with data selector: 38ns/iteration
   gs with data selector: 45ns/iteration

   <none> with LDT selector: 0ns/iteration
   fs with LDT selector: 39ns/iteration
   gs with LDT selector: 41ns/iteration

   <none> with GDT selector: 0ns/iteration
   fs with GDT selector: 39ns/iteration
   gs with GDT selector: 44ns/iteration

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux