Re: [patch] sched: auto-tune migration costs [was: Re: Industry db benchmark result on recent 2.6 kernels]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ingo wrote:
> i've attached the latest snapshot.

I ran your latest snapshot on 64 CPU (well, 62 - one node wasn't
working) system.  I made one change - chop the matrix lines at 8 terms. 
It's a hack - don't know if it's a good idea.  But the long lines were
hard to read (and would only get worse on a 512).  And I had a fear,
probably unfounded, that the long lines could slow things down.

It built and ran fine, exactly as provided, against 2.6.12-rc1-mm4. I
probably have the unchopped matrix output in my screenlog file, if you
want it.  Though, given that the matrix is more or less symmetric, I
wasn't seeing much value in the part I chopped.

It took 24 seconds - a little painful, but booting this system takes
a few minutes, so 24 seconds is not fatal - just painful.

The maximum finding code - to stop scanning after the max has been
passed, works fine.  If it had been (impossibly) perfect, stopping right
at the max, it would have been perhaps 30% faster, so there is not a
huge amount to be gained from trying to fine tune the scan termination
logic.

I can imagine that one could trim this time by doing a couple of scans,
the first one at lower density (perhaps just one out of four sizes
considered), then the second scan at full density, around the maximum
found by the first.  However this would be less robust, and yet more
logic.

Or perhaps, long shot, one could get fancy with some parameterized curve
fitting.  If some equation is a reasonably fit for the function being
sampled here, then just a low density scan through the max could be used
to estimate the co-efficients of whatever the equation was, and the
equation used to find the maximum, instead of the samples.  This would
be fun to play with, but I can't now - other duties are calling.

The one change:

diff -Naurp auto-tune_migration_costs/kernel/sched.c auto-tune_migration_costs_chopped/kernel/sched.c
--- auto-tune_migration_costs/kernel/sched.c	2005-04-04 09:11:43.000000000 -0700
+++ auto-tune_migration_costs_chopped/kernel/sched.c	2005-04-04 09:11:22.000000000 -0700
@@ -5287,6 +5287,7 @@ void __devinit calibrate_migration_costs
 			distance = domain_distance(cpu1, cpu2);
 			max_distance = max(max_distance, distance);
 			cost = migration_cost[distance];
+			if (cpu2 < 8)
 			printk(" %2ld.%ld(%ld)", (long)cost / 1000000,
 				((long)cost / 100000) % 10, distance);
 		}

With this change, the output was:

Memory: 243350592k/244270096k available (7182k code, 921216k reserved, 3776k data, 368k init)
McKinley Errata 9 workaround not needed; disabling it
Dentry cache hash table entries: 33554432 (order: 14, 268435456 bytes)
Inode-cache hash table entries: 16777216 (order: 13, 134217728 bytes)
Mount-cache hash table entries: 1024
Boot processor id 0x0/0x40
Brought up 62 CPUs
Total of 62 processors activated (138340.68 BogoMIPS).
-> [0][2][3145728]  12.3 [ 12.3] (1): (12361880  6180940)
-> [0][2][3311292]  13.1 [ 13.1] (1): (13175591  3497325)
-> [0][2][3485570]  13.7 [ 13.7] (1): (13718647  2020190)
-> [0][2][3669021]  14.3 [ 14.3] (1): (14356800  1329171)
-> [0][2][3862127]  15.5 [ 15.5] (1): (15522156  1247263)
-> [0][2][4065396]  16.4 [ 16.4] (1): (16487934  1106520)
-> [0][2][4279364]  17.3 [ 17.3] (1): (17356154   987370)
-> [0][2][4504593]  18.1 [ 18.1] (1): (18144452   887834)
-> [0][2][4741676]  18.9 [ 18.9] (1): (18934638   839010)
-> [0][2][4991237]  19.9 [ 19.9] (1): (19965884   935128)
-> [0][2][5253933]  21.0 [ 21.0] (1): (21067441  1018342)
-> [0][2][5530455]  22.3 [ 22.3] (1): (22303727  1127314)
-> [0][2][5821531]  23.4 [ 23.4] (1): (23453867  1138727)
-> [0][2][6127927]  23.4 [ 23.4] (1): (23406625   592984)
-> [0][2][6450449]  23.5 [ 23.5] (1): (23586123   386241)
-> [0][2][6789946]  23.5 [ 23.5] (1): (23519823   226270)
-> [0][2][7147311]  22.6 [ 23.5] (1): (22619385   563354)
-> [0][2][7523485]  21.9 [ 23.5] (1): (21998024   592357)
-> [0][2][7919457]  20.7 [ 23.5] (1): (20705771   942305)
-> [0][2][8336270]  17.2 [ 23.5] (1): (17244361  2201857)
-> [0][2][8775021]  14.6 [ 23.5] (1): (14644331  2400943)
-> found max.
[0][2] working set size found: 6450449, cost: 23586123
-> [0][32][3145728]  17.8 [ 17.8] (2): (17848927  8924463)
-> [0][32][3311292]  18.8 [ 18.8] (2): (18811236  4943386)
-> [0][32][3485570]  19.7 [ 19.7] (2): (19779337  2955743)
-> [0][32][3669021]  20.8 [ 20.8] (2): (20811634  1994020)
-> [0][32][3862127]  21.9 [ 21.9] (2): (21919806  1551096)
-> [0][32][4065396]  23.0 [ 23.0] (2): (23075814  1353552)
-> [0][32][4279364]  24.2 [ 24.2] (2): (24267691  1272714)
-> [0][32][4504593]  25.5 [ 25.5] (2): (25546809  1275916)
-> [0][32][4741676]  26.8 [ 26.8] (2): (26886375  1307741)
-> [0][32][4991237]  28.2 [ 28.2] (2): (28291601  1356483)
-> [0][32][5253933]  29.5 [ 29.5] (2): (29587239  1326060)
-> [0][32][5530455]  30.6 [ 30.6] (2): (30669228  1204024)
-> [0][32][5821531]  30.9 [ 30.9] (2): (30969069   751932)
-> [0][32][6127927]  30.3 [ 30.9] (2): (30353322   683839)
-> [0][32][6450449]  29.3 [ 30.9] (2): (29381521   827820)
-> [0][32][6789946]  27.4 [ 30.9] (2): (27459958  1374691)
-> [0][32][7147311]  26.4 [ 30.9] (2): (26403308  1215670)
-> [0][32][7523485]  23.9 [ 30.9] (2): (23967782  1825598)
-> [0][32][7919457]  19.4 [ 30.9] (2): (19483305  3155037)
-> found max.
[0][32] working set size found: 5821531, cost: 30969069
---------------------
| migration cost matrix (max_cache_size: 6291456, cpu: -1 MHz):
---------------------
          [00]    [01]    [02]    [03]    [04]    [05]    [06]    [07]    [08]    [09]    [10]    [11]    [12]    [13]    [14]    [15]    [16]    [17]    [18]    [19]    [20]    [21]    [22]    [23]    [24]    [25]    [26]    [27]    [28]    [29]    [30]    [31]    [32]    [33]    [34]    [35]    [36]    [37]    [38]    [39]    [40]    [41]    [42]    [43]    [44]    [45]    [46]    [47]    [48]    [49]    [50]    [51]    [52]    [53]    [54]    [55]    [56]    [57]    [58]    [59]    [60]    [61]
[00]:     -     0.0(0) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)
[01]:   0.0(0)    -    47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)
[02]:  47.1(1) 47.1(1)    -     0.0(0) 47.1(1) 47.1(1) 47.1(1) 47.1(1)
[03]:  47.1(1) 47.1(1)  0.0(0)    -    47.1(1) 47.1(1) 47.1(1) 47.1(1)
[04]:  47.1(1) 47.1(1) 47.1(1) 47.1(1)    -     0.0(0) 47.1(1) 47.1(1)
[05]:  47.1(1) 47.1(1) 47.1(1) 47.1(1)  0.0(0)    -    47.1(1) 47.1(1)
[06]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -     0.0(0)
[07]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)  0.0(0)    -
[08]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[09]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[10]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[11]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[12]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[13]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[14]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[15]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[16]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[17]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[18]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[19]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[20]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[21]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[22]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[23]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[24]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[25]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[26]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[27]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[28]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[29]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[30]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[31]:  47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1) 47.1(1)    -
[32]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[33]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[34]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1)    -
[35]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1)    -
[36]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[37]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[38]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1)    -
[39]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1)    -
[40]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[41]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[42]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1)    -
[43]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1)    -
[44]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[45]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[46]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1)    -
[47]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 47.1(1) 47.1(1)    -
[48]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[49]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[50]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[51]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[52]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[53]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[54]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[55]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[56]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[57]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[58]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[59]:  47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[60]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
[61]:  61.9(2) 61.9(2) 47.1(1) 47.1(1) 61.9(2) 61.9(2) 61.9(2) 61.9(2)    -
--------------------------------
| cacheflush times [3]: 0.0 (-1) 47.1 (47172246) 61.9 (61938138)
| calibration delay: 24 seconds
--------------------------------

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux