> as a summary: i think your numbers demonstrate it nicely that the > shorter 'timeslice length' that both CFS and SD utilizes does not have a > measurable negative impact on your workload. That's my impression as well. In particular I think that for this type of workload it is pretty much irrelevant which scheduler I'm using :-) > To measure the total impact > of 'timeslicing' you might want to try the exact same workload with a > much higher 'timeslice length' of say 400 msecs, via: > > echo 400000000 > /proc/sys/kernel/sched_granularity_ns # on CFS > echo 400 > /proc/sys/kernel/rr_interval # on SD I just finished building cfs and sd against 2.6.21 and will try with these over the next days. > your existing numbers are a bit hard to analyze because the 3 workloads > were started at the same time and they overlapped differently and > utilized the system differently. Right. However the differences w/r to which job finished when, in particular the change in who "came in" 2. and 3. between sd and cfs had been consistent with other similar such jobs I ran over the last couple of days. In general sd tends to finish all three such jobs at roughly the same time while cfs clearly "favors" the LTMM-type jobs (which admittedly involve the least computations). I don't really know why that is so. However the jobs does some minimal I/O after each successfully finish run and that might be different w/r to sd and cfs. Not really knowing what both scheduler do internally I'm in no position to discuss that with either you or Con :) > i think the primary number that makes sense to look at (which is perhaps > the least sensitive to the 'overlap effect') is the 'combined user times > of all 3 workloads' (in order of performance): > > > 2.6.21-rc7: 20589.423 100.00% > > 2.6.21-rc7-cfs-v6-rc2 (X @ nice -10): 20613.845 99.88% > > 2.6.21-rc7-sd046: 20617.945 99.86% > > 2.6.21-rc7-cfs-v6-rc2 (X @ nice 0): 20743.564 99.25% > > to me this gives the impression that it's all "within noise". I think so too. But then I didn't seriously expect any different. > In > particular the two CFS results suggest that there's at least a ~100 > seconds noise in these results, because the renicing of X should have no > impact on the result (the workloads are pure number-crunchers, and all > use up the CPUs 100%, correct?), Correct. The differences could as well result from small fluctuations in my work on the machine during the test (like reading mail, editing sources and similar stuff). > another (perhaps less reliable) number is the total wall-clock runtime > of all 3 jobs. Provided i did not make any mistakes in my calculations, > here are the results: > > > 2.6.21-rc7-sd046: 10512.611 seconds > > 2.6.21-rc7-cfs-v6-rc2 (X @ nice -10): 10605.946 seconds > > 2.6.21-rc7: 10650.535 seconds > > 2.6.21-rc7-cfs-v6-rc2 (X @ nice 0): 10788.125 seconds > > (the numbers are lower than the first numbers because this is a 2 CPU > system) > > both SD and CFS-nice-10 was faster than mainline, but i'd say this too > is noise - especially because this result highly depends on the way the > workloads overlap in general, which seems to be different for SD. I don't think that's noise. As I wrote above IMO this is the one difference I personally consider significant. While running I could watch intermediate output of all 3 jobs intermixed on a console. With sd the jobs were -within reason- head to head while with cfs and mainline LTMM quickly got ahead and remained there. As I speculated above this (IMO real) difference in the way sd and cfs (and mainline) schedule the jobs might be related to the way the schedulers react on I/O but I don't know. > system time is interesting too: > > > 2.6.21-rc7: 35.379 > > 2.6.21-rc7-cfs-v6-rc2 (X @ nice -10): 40.399 > > 2.6.21-rc7-cfs-v6-rc2 (X @ nice 0): 44.239 > > 2.6.21-rc7-sd046: 45.515 Over the weekend I might try to repeat these tests while doing nothing else on the machine (i.e. no editing or reading mail). > combined system+user time: > > > 2.6.21-rc7: 20624.802 > > 2.6.21-rc7-cfs-v6-rc2 (X @ nice 0): 20658.084 > > 2.6.21-rc7-sd046: 20663.460 > > 2.6.21-rc7-cfs-v6-rc2 (X @ nice -10): 20783.963 > > perhaps it might make more sense to run the workloads serialized, to > have better comparabality of the individual workloads. (on a real system > you'd naturally want to overlap these workloads to utilize the CPUs, so > the numbers you did are very relevant too.) Yes. In fact the workloads used to be run serialized and only recently I tried to easily have them run on multicore systems. The whole set of such jobs consists of 312 tasks and thus the idea is to get a couple of PS3 and have them run there... (once I found the time to port the software). > The vmstat suggested there is occasional idle time in the system - is > the workload IO-bound (or memory bound) in those cases? There is writing out stats after each run (i.e. every 5-8 sec) both to the PostgreSQL DB as well as to the console. I could get rid of the console I/O (i.e. make it conditional) and for testing purpose could switch off the DB if that would make be helpful. Best, Michael -- Technosis GmbH, Geschäftsführer: Michael Gerdau, Tobias Dittmar Sitz Hamburg; HRB 89145 Amtsgericht Hamburg Vote against SPAM - see http://www.politik-digital.de/spam/ Michael Gerdau email: [email protected] GPG-keys available on request or at public keyserver
Attachment:
pgpF7kFAi51Ys.pgp
Description: PGP signature
- Follow-Ups:
- Re: [REPORT] cfs-v6-rc2 vs sd-0.46 vs 2.6.21-rc7
- From: Ingo Molnar <[email protected]>
- Re: [REPORT] cfs-v6-rc2 vs sd-0.46 vs 2.6.21-rc7
- References:
- [REPORT] cfs-v6-rc2 vs sd-0.46 vs 2.6.21-rc7
- From: Michael Gerdau <[email protected]>
- Re: [REPORT] cfs-v6-rc2 vs sd-0.46 vs 2.6.21-rc7
- From: Ingo Molnar <[email protected]>
- [REPORT] cfs-v6-rc2 vs sd-0.46 vs 2.6.21-rc7
- Prev by Date: Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2: hang in atomic copy)
- Next by Date: Re: [00/17] Large Blocksize Support V3
- Previous by thread: Re: [REPORT] cfs-v6-rc2 vs sd-0.46 vs 2.6.21-rc7
- Next by thread: Re: [REPORT] cfs-v6-rc2 vs sd-0.46 vs 2.6.21-rc7
- Index(es):