Re: [patch 10/10] Scheduler profiling - Use immediate values

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 03, 2007 at 02:57:48PM -0400, Mathieu Desnoyers wrote:
> * Alexey Dobriyan ([email protected]) wrote:
> > On Tue, Jul 03, 2007 at 12:40:56PM -0400, Mathieu Desnoyers wrote:
> > > Use immediate values with lower d-cache hit in optimized version as a
> > > condition for scheduler profiling call.
> > 
> > How much difference in performance do you see?
> > 
> 
> Hi Alexey,
> 
> Please have a look at Documentation/immediate.txt for that information.
> Also note that the main advantage of the load immediate is to free a
> cache line. Therefore, I guess the best way to quantify the improvement
> it brings at one single site is not in terms of cycles, but in terms of
> number of cache lines used by the scheduler code. Since memory bandwidth
> seems to be an increasing bottleneck (CPU frequency increases faster
> than the available memory bandwidth), it makes sense to free as much
> cache lines as we can.
> 
> Measuring the overall impact on the system of this single modification
> results in the difference brought by one site within the standard
> deviation of the normal samples. It will become significant when the
> number of immediate values used instead of global variables at hot
> kernel paths (need to ponder with the frequency at which the data is
> accessed) will start to be significant compared to the L1 data cache
> size. We could characterize this in memory to L1 cache transfers per
> seconds.
> 
> On 3GHz P4:
> 
> memory read: ~48 cycles
> 
> So we can definitely say that 48*HZ (approximation of the frequency at
> which the scheduler is called) won't make much difference, but as it
> grows, it will.
> 
> On a 1000HZ system, it results in:
> 
> 48000 cycles/second, or 16µs/second, or 0.000016% speedup.
> 
> However, if we place this in code called much more often, such as
> do_page_fault, we get, with an hypotetical scenario of approximation
> of 100000 page faults per second:
> 
> 4800000 cycles/s, 1.6ms/second or 0.0016% speedup.
> 
> So as the number of immediate values used increase, the overall memory
> bandwidth required by the kernel will go down.

Might make a nice scientific paper, but even according to your own 
optimistic numbers it's not realistic that you will ever achieve any 
visible improvement even if you'd find 100 places in hotpaths you could 
mark this way.

And a better direction for hotpaths seems to be Andi's __cold/COLD in 
-mm without adding an own framework for doing such things.

> Mathieu

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux