Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

Three more observations.

 1) The slowest measure_one() calls are, not surprisingly, those for the
    largest sizes.  At least on my test system of the moment, the plot
    of cost versus size has one major maximum (a one hump camel, not two).
    
    Seems like if we computed from smallest size upward, instead of largest
    downward, and stopped whenever two consecutive measurements were less
    than say 70% of the max seen so far, then we could save a nice chunk
    of the time.

    Of course, if two hump systems exist, this is not reliable on them.

 2) Trivial warning fix for printf format mismatch:

=================================== begin ===================================
--- 2.6.12-rc1-mm4.orig/kernel/sched.c  2005-04-03 06:32:34.000000000 -0700
+++ 2.6.12-rc1-mm4/kernel/sched.c       2005-04-03 06:34:07.000000000 -0700
@@ -5211,7 +5211,7 @@ void __devinit calibrate_migration_costs
 #ifdef CONFIG_X86
                        cpu_khz/1000
 #else
-                       -1
+                       -1L
 #endif
                );
        printk("---------------------\n");
==================================== end ====================================


 3) I was noticing that my test system was only showing a couple of distinct
    values for cpu_distance, even though it has 4 distinct distances for
    values of node_distance.  So I coded up a variant of cpu_distance that
    converts the problem to a node_distance problem, and got the following
    cost matrix:

=================================== begin ===================================
Total of 8 processors activated (15515.64 BogoMIPS).
---------------------
migration cost matrix (max_cache_size: 0, cpu: -1 MHz):
---------------------
          [00]    [01]    [02]    [03]    [04]    [05]    [06]    [07]
[00]:     -     4.0(0) 21.7(1) 21.7(1) 25.2(2) 25.2(2) 25.3(3) 25.3(3)
[01]:   4.0(0)    -    21.7(1) 21.7(1) 25.2(2) 25.2(2) 25.3(3) 25.3(3)
[02]:  21.7(1) 21.7(1)    -     4.0(0) 25.3(3) 25.3(3) 25.2(2) 25.2(2)
[03]:  21.7(1) 21.7(1)  4.0(0)    -    25.3(3) 25.3(3) 25.2(2) 25.2(2)
[04]:  25.2(2) 25.2(2) 25.3(3) 25.3(3)    -     4.0(0) 21.7(1) 21.7(1)
[05]:  25.2(2) 25.2(2) 25.3(3) 25.3(3)  4.0(0)    -    21.7(1) 21.7(1)
[06]:  25.3(3) 25.3(3) 25.2(2) 25.2(2) 21.7(1) 21.7(1)    -     4.0(0)
[07]:  25.3(3) 25.3(3) 25.2(2) 25.2(2) 21.7(1) 21.7(1)  4.0(0)    -
---------------------
cacheflush times [4]: 4.0 (4080540) 21.7 (21781380) 25.2 (25259428) 25.3 (25372682)
---------------------
==================================== end ====================================


    The code (below) is twice as complicated, the runtime twice as long,
    and it's less intuitive - sched_domains seems more appropriate as
    the basis for migration costs than the node distances in SLIT tables.
    Finally, I don't know if distinguishing between costs of 21.7 and
    25.3 is worth much.

    So the case for switching to this node_distance base is less than
    persuasive, to put it politely.

    Perhaps it's only real value is in highlighting that perhaps the
    code to setup the sched_domain topology on our ia64 SN2 Altix systems
    is too coarse, given that it only found two distance values, not four.

    If that's the case, I will have to call in someone else to examine
    whether it's appropriate to refine the sched_domains setup for this
    kind of system.  I'm not competent to determine that, nor to code it.

    Here's the code that bases cpu_distance on node_distance:

=================================== begin ===================================
__init static int cmpint(const void *a, const void *b)
{
        return *(int *)a - *(int *)b;
}

/*
 * Estimate distance of two CPUs based on their node_distance,
 * mapping to sequential integers 0, 1, ... N-1, for the N
 * distinct values of distances (closest CPUs are distance 0,
 * farthest CPUs are distance N-1).  If there are more than
 * MAX_DOMAIN_DISTANCE distinct different distance values,
 * collapse the larger distances to one value.
 */

__init static unsigned long cpu_distance(int cpu1, int cpu2)
{
	static int num_dist_vals;
	static int node_distances[MAX_DOMAIN_DISTANCE];
	int dist = node_distance(cpu_to_node(cpu1), cpu_to_node(cpu2));
	int v;

	if (num_dist_vals == 0) {
		int i, j, k;

		/*
		 * For each dist not already in node_distances[], if there's
		 * room or it's less than an existing 'luser' entry, add it.
		 */
		for_each_online_node(i) {
			for_each_online_node(j) {
				int dist = node_distance(i, j);
				int luser = -1;

				for (k = 0; k < num_dist_vals; k++) {
					if (node_distances[k] == dist)
						break;
					if (dist < node_distances[k])
						luser = k;
				}
				if (node_distances[k] == dist)
					continue;
				else if (num_dist_vals < MAX_DOMAIN_DISTANCE)
					node_distances[num_dist_vals++] = dist;
				else if (luser >= 0)
					node_distances[luser] = dist;
			}
		}
		/*
		 * Now node_distances[] has the smallest num_dist_vals
		 * distinct node distance values.  Lets sort them.
		 */
		sort(node_distances, num_dist_vals, sizeof(int), cmpint, NULL);

		if (migration_debug) {
			printk("Sorted node_distances used for cpu distances\n");
			for(v = 0; v < num_dist_vals; v++)
				printk("  node_distances[%d] = %d\n",
						v, node_distances[v]);
		}
	}

	/* The "- 1" maps all larger sizes onto one */
	for (v = 0; v < num_dist_vals - 1; v++)
		if (dist == node_distances[v])
			break;
	return v;
}
==================================== end ====================================


-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]
  - From: Ingo Molnar <[email protected]>
- Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]
  - From: Ingo Molnar <[email protected]>

References:
- RE: Industry db benchmark result on recent 2.6 kernels
  - From: "Chen, Kenneth W" <[email protected]>
- [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]
  - From: Ingo Molnar <[email protected]>
- Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]
  - From: Paul Jackson <[email protected]>
- Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]
  - From: Ingo Molnar <[email protected]>
- Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]
  - From: Paul Jackson <[email protected]>

Prev by Date: Re: Can't use SYSFS for "Proprietry" driver modules !!!.
Next by Date: Re: [RFC,PATCH 2/4] Deprecate synchronize_kernel, GPL replacement
Previous by thread: Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]
Next by thread: Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]