Re: [SCHED] Totally WRONG prority calculation with specific test-case (since 2.6.10-bk12)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Paolo Ornati wrote:
On Wed, 28 Dec 2005 10:59:13 +1100
Peter Williams <[email protected]> wrote:


Any chance of you applying the PlugSched patches and seeing how the other schedulers that it contains handle this situation?

The patch at:

<http://prdownloads.sourceforge.net/cpuse/plugsched-6.1.6-for-2.6.15-rc5.patch?download>

should apply without problems to the 2.6.15-rc7 kernel.

Very Brief Documentation:

You can select a default scheduler at kernel build time.  If you wish to
boot with a scheduler other than the default it can be selected at boot
time by adding:

cpusched=<scheduler>

to the boot command line where <scheduler> is one of: ingosched,
nicksched, staircase, spa_no_frills, spa_ws, spa_svr or zaphod.  If you
don't change the default when you build the kernel the default scheduler
will be ingosched (which is the normal scheduler).



First of all, this is the "pstree" structure of transcode an friends:

     |-kdesktop---perl---sh---transcode-+-2*[sh-+-tccat]
     |                                  |       |-tcdecode]
     |                                  |       |-tcdemux]
     |                                  |       `-tcextract]
     |                                  `-transcode---5*[transcode]


Results with various schedulers:

First, thanks for doing this.


------------------------------------------------------------------------

	1) nicksched: perfect! This is the behaviour I want.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5562 paolo     40   0  115m  18m 2428 R 82.2  3.7   0:22.16 transcode
 5576 paolo     26   0 50348 4516 1912 S  9.5  0.9   0:02.43 tcdecode
 5566 paolo     23   0  115m  18m 2428 S  4.6  3.7   0:01.24 transcode
 5573 paolo     21   0  115m  18m 2428 S  0.9  3.7   0:00.22 transcode
 5577 paolo     27   0 20356 1140  920 S  0.9  0.2   0:00.21 tcdemux
 5295 root      20   0  167m  17m 3624 S  0.6  3.5   0:11.02 X
 5579 paolo     20   0 47308 2540 1996 S  0.5  0.5   0:00.14 tcdecode
 5574 paolo     20   0 20356 1144  920 S  0.4  0.2   0:00.11 tcdemux
...

transcode get recognized for what it is, and I/O bounded processes
don't even notice that it is running :)

Interesting. This one's more or less a dead scheduler and hasn't had any development work done on it for some time. I just keep porting the original version to new kernels.



	2) staircase: bad, as you can see:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5582 paolo     26   0  115m  18m 2428 R 82.7  3.7   0:47.63 transcode
 5599 paolo     39   0 50352 4516 1912 R  9.6  0.9   0:05.21 tcdecode
 5586 paolo      0   0  115m  18m 2428 S  4.5  3.7   0:02.61 transcode
 5622 paolo     39   0  4948 1520  412 R  1.1  0.3   0:00.15 dd
 5591 paolo      0   0  115m  18m 2428 S  0.6  3.7   0:00.36 transcode
 5575 paolo      0   0 98476  37m 9392 S  0.4  7.5   0:01.44 perl
 5597 paolo     27   0 20356 1144  920 S  0.4  0.2   0:00.21 tcdemux
 5475 paolo      0   0 86556  22m  15m S  0.2  4.5   0:01.24 konsole
 5388 root       0   0  167m  17m 3208 S  0.1  3.4   0:03.16 X
 5587 paolo      0   0  115m  18m 2428 S  0.1  3.7   0:00.03 transcode
 5595 paolo     20   0 47312 2540 1996 S  0.1  0.5   0:00.14 tcdecode
 5596 paolo     26   0 22672 1268 1020 S  0.1  0.2   0:00.03 tccat
 5598 paolo     28   0 22364 1436  932 S  0.1  0.3   0:00.04 tcextract


And "DD" is affected badly:

paolo@tux /mnt $ mount space/; sync; sleep 1; time dd if=space/bigfile
of=/dev/null bs=1M count=128; umount space/ 128+0 records in
128+0 records out

real    0m6.341s
user    0m0.002s
sys     0m0.229s

While transcoding:

paolo@tux /mnt $ mount space/; sync; sleep 1; time dd if=space/bigfile
of=/dev/null bs=1M count=256; umount space/ 256+0 records in
256+0 records out

real    0m15.793s
user    0m0.001s
sys     0m0.374s


	3) spa_no_frills: bad, but this is OK since it is Round Robin :)

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5356 paolo     20   0  115m  18m 2428 R 81.1  3.7   0:27.61 transcode
 5371 paolo     20   0 50348 4516 1912 R  8.9  0.9   0:02.97 tcdecode
 5360 paolo     20   0  115m  18m 2428 S  4.1  3.7   0:01.54 transcode
 5378 paolo     20   0  4948 1520  412 D  1.4  0.3   0:00.29 dd
 5364 paolo     20   0 20352 1144  920 S  0.9  0.2   0:00.20 tcdemux
 5373 paolo     20   0  115m  18m 2428 S  0.7  3.7   0:00.32 transcode
 5369 paolo     20   0 20356 1144  920 S  0.5  0.2   0:00.14 tcdemux
 5205 root      20   0  165m  15m 2584 R  0.2  3.2   0:01.86 X


Yes, no surprises there.


	4) spa_ws: bad

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5334 paolo     32   0  115m  18m 2428 R 82.7  3.7   0:18.77 transcode
 5349 paolo     32   0 50348 4516 1912 R  8.9  0.9   0:02.00 tcdecode
 5338 paolo     21   0  115m  18m 2428 S  4.6  3.7   0:01.08 transcode
 5356 paolo     32   0  4948 1520  412 D  1.1  0.3   0:00.12 dd
 5351 paolo     32   0  115m  18m 2428 S  1.0  3.7   0:00.20 transcode
 5199 root      21   0  165m  15m 2584 S  0.4  3.2   0:01.68 X
 5347 paolo     32   0 20356 1140  920 S  0.4  0.2   0:00.08 tcdemux
 5296 paolo     22   0 98472  37m 9392 S  0.2  7.5   0:01.47 perl
 5299 paolo     21   0 86556  22m  15m S  0.2  4.4   0:00.75 konsole
 5344 paolo     32   0 47308 2540 1996 S  0.2  0.5   0:00.07 tcdecode
 5339 paolo     21   0  115m  18m 2428 S  0.1  3.7   0:00.01 transcode

paolo@tux /mnt $ mount space/; sync; sleep 1; time dd if=space/bigfile
of=/dev/null bs=1M count=256; umount space/ 256+0 records in
256+0 records out

real    0m8.112s
user    0m0.001s
sys     0m0.444s

paolo@tux /mnt $ mount space/; sync; sleep 1; time dd if=space/bigfile
of=/dev/null bs=1M count=256; umount space/ 256+0 records in
256+0 records out

real    0m29.222s
user    0m0.000s
sys     0m0.400s

This one is aimed purely at good interactive responsiveness (i.e. keyboard, mouse, X server and media players such as rythmbox/xmms) so no real surprises here either.



	5) spa_svr: surprise, surprise! Not all that bad. At least DD
gets better priority than transcode... and DD real time is only a bit
affected (8s --> ~9s).


This will be the "throughput bonus" in action. It's overall aim is to reduce the time tasks spend on the runqueue waiting for CPU access a.k.a. delay. It does this by using the system load and the average amount of CPU time that the task uses each scheduling cycle to estimate the expected delay for the task and gives it a bonus if the actual average delays being experienced are bigger than this value.

It's intended for server systems not interactive systems as reducing overall delay isn't necessarily good for interactive systems where the aim is to quell the user's impatience by giving good latency to the interactive tasks. These aims aren't always compatible.


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5334 paolo     33   0  115m  18m 2428 R 78.1  3.7   0:22.70 transcode
 5349 paolo     28   0 50352 4516 1912 S  9.0  0.9   0:02.41 tcdecode
 5338 paolo     25   0  115m  18m 2428 S  4.7  3.7   0:01.29 transcode
 5363 paolo     27   0  4952 1520  412 R  4.7  0.3   0:00.25 dd
 5342 paolo     33   0 20352 1140  920 S  1.6  0.2   0:00.21 tcdemux
 5351 paolo     25   0  115m  18m 2428 S  0.8  3.7   0:00.23 transcode
 5144 root      22   0  166m  16m 3120 S  0.4  3.3   0:01.85 X
 5344 paolo     23   0 47308 2540 1996 S  0.4  0.5   0:00.13 tcdecode
 5347 paolo     27   0 20356 1144  920 S  0.4  0.2   0:00.10 tcdemux
 5231 paolo     22   0 86660  22m  15m S  0.2  4.5   0:00.95 konsole
 5271 paolo     25   0 98476  37m 9396 S  0.2  7.5   0:01.54 perl
 5341 paolo     23   0 22672 1268 1020 S  0.2  0.2   0:00.02 tccat


	6) zaphod: more or less like spa_svr

Zaphod includes the throughput bonus in its armoury which why it is similar in performance to spa_svr.


  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5308 paolo     34   0  115m  18m 2428 R 52.1  3.7   0:49.77 transcode
 5323 paolo     32   0 50352 4516 1912 S  6.0  0.9   0:05.61 tcdecode
 5356 paolo     28   0  4952 1520  412 D  3.5  0.3   0:00.28 dd
 5312 paolo     28   0  115m  18m 2428 S  2.6  3.7   0:02.71 transcode
 5325 paolo     31   0  115m  18m 2428 S  0.7  3.7   0:00.55 transcode
 5316 paolo     37   0 20352 1140  920 S  0.4  0.2   0:00.33 tcdemux
 5202 root      23   0  165m  15m 2584 S  0.2  3.1   0:01.57 X
 5318 paolo     31   0 47312 2540 1996 S  0.2  0.5   0:00.28 tcdecode
 5321 paolo     33   0 20356 1144  920 S  0.2  0.2   0:00.26 tcdemux
 4760 messageb  25   0 13248 1068  848 S  0.1  0.2   0:00.07
dbus-daemon-1 5264 paolo     24   0 93920  17m  10m S  0.1  3.5
0:00.38 kded 5282 paolo     23   0 92712  19m  12m S  0.1  3.9
0:00.36 kdesktop


	7) ingosched: bad, as already said in the original post

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5209 paolo     16   0  115m  18m 2428 R 72.0  3.7   0:22.13 transcode
 5224 paolo     22   0 50348 4516 1912 R  8.4  0.9   0:02.44 tcdecode
 5213 paolo     15   0  115m  18m 2428 S  4.2  3.7   0:01.24 transcode
 5243 paolo     18   0  4948 1520  412 R  1.8  0.3   0:00.14 dd
 5217 paolo     19   0 20356 1144  920 R  0.8  0.2   0:00.19 tcdemux
 5108 root      15   0  165m  15m 2584 S  0.6  3.1   0:01.44 X
 5226 paolo     15   0  115m  18m 2428 S  0.6  3.7   0:00.20 transcode
 5216 paolo     18   0 22676 1268 1020 S  0.4  0.2   0:00.03 tccat
 5219 paolo     18   0 47312 2540 1996 R  0.4  0.5   0:00.12 tcdecode
 5222 paolo     18   0 20356 1144  920 S  0.4  0.2   0:00.10 tcdemux
 5195 paolo     16   0 98488  37m 9392 S  0.2  7.5   0:01.41 perl
 5198 paolo     16   0 86552  22m  15m R  0.2  4.4   0:00.66 konsole

paolo@tux /mnt $ mount space/; sync; sleep 1; time dd if=space/bigfile of=/dev/null bs=1M count=256; umount space/
256+0 records in
256+0 records out

real    0m23.393s	(instead of 8s)
user    0m0.001s
sys     0m0.418s

------------------------------------------------------------------------


So the winner for manifest superiority is "nicksched", it looks to me
even better than 2.6.10-bk12 (ingosched) with
"remove_interactive_credit" reverted.

Thanks for this data. It will enable me to make some mods to the spa_xxx and zaphod schedulers.

Peter
--
Peter Williams                                   [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux