Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)

MAEDA Naoaki wrote:


I tried to run kernbench with hard cap, and then it spent a very
long time on "Cleaning souce tree..." phase. Because this phase
is not CPU hog, my expectation is that it act as without cap.

That can be reproduced by just running "make clean" on top of a
kernel source tree with hard cap.

% /usr/bin/time make clean

1.62user 0.29system 0:01.90elapsed 101%CPU (0avgtext+0avgdata0maxresident)k

0inputs+0outputs (0major+68539minor)pagefaults 0swaps

  # Without cap, it returns almost immediately

% ~/withcap.sh  -C 900 /usr/bin/time make clean
1.61user 0.29system 1:26.17elapsed 2%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+68537minor)pagefaults 0swaps

  # With 90% hard cap, it takes about 1.5 minutes.

% ~/withcap.sh  -C 100 /usr/bin/time make clean
1.64user 0.34system 3:31.48elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+68538minor)pagefaults 0swaps

  # It became worse with 10% hard cap.

% ~/withcap.sh  -c 900 /usr/bin/time make clean

1.63user 0.28system 0:01.89elapsed 100%CPU (0avgtext+0avgdata0maxresident)k

0inputs+0outputs (0major+68537minor)pagefaults 0swaps

  # It doesn't happen with soft cap.

This behaviour is caused by the "make clean" being a short lived CPUintensive task. It was made worse by the facts that my simplificationsto the sinbin duration calculation which assumed a constant CPU burstsize based on the time slice and that exiting tasks could still havecaps enforced. (The simplification was done to avoid 64 bit divides.)

I've put in a more complex sinbin calculation (and don't think the 64bit divides will matter too much as they're on an infrequently travelledpath. Exiting tasks are now excluded from having caps enforced on thegrounds that it's best for system performance to let them get out of theway as soon as possible. A patch is attached and I would appreciate itif you could see if it improves the situation you are observing.

These changes don't completely get rid of the phenomenon but I thinkthat it's less severe. I've written a couple of scripts to test thisbehaviour using the wload program from:


<http://prdownloads.sourceforge.net/cpuse/simloads-0.1.1.tar.gz?download>

You run loop.sh with a single argument and it uses asps.sh. What thetest does is run a number (specified by the argument to loop.sh) ofinstances of wload in series and uses time to get the stats for theseries to complete. It does these for a number of different durationsof wload running between 0.001 and 10.0 seconds. Here's an example ofan output from an uncapped run:


Peter[peterw@heathwren ~]$ ./loops.sh 1
-d=0.001: user = 0.01 system = 0.00 elapsed = 0.00 rate = 133%
-d=0.005: user = 0.01 system = 0.00 elapsed = 0.01 rate = 84%
-d=0.01: user = 0.02 system = 0.00 elapsed = 0.01 rate = 105%
-d=0.05: user = 0.06 system = 0.00 elapsed = 0.05 rate = 103%
-d=0.1: user = 0.10 system = 0.00 elapsed = 0.11 rate = 98%
-d=0.5: user = 0.50 system = 0.00 elapsed = 0.50 rate = 100%
-d=1.0: user = 1.00 system = 0.00 elapsed = 1.01 rate = 99%
-d=5.0: user = 5.00 system = 0.00 elapsed = 5.01 rate = 99%
-d=10.0: user = 10.00 system = 0.00 elapsed = 10.01 rate = 99%

and with a cap of 90%:

[peterw@heathwren ~]$ withcap -C 900 ./loops.sh 1
-d=0.001: user = 0.00 system = 0.00 elapsed = 0.01 rate = 53%
-d=0.005: user = 0.01 system = 0.00 elapsed = 0.02 rate = 61%
-d=0.01: user = 0.01 system = 0.00 elapsed = 0.03 rate = 66%
-d=0.05: user = 0.06 system = 0.00 elapsed = 0.07 rate = 85%
-d=0.1: user = 0.10 system = 0.00 elapsed = 0.11 rate = 91%
-d=0.5: user = 0.50 system = 0.00 elapsed = 0.56 rate = 90%
-d=1.0: user = 1.00 system = 0.00 elapsed = 1.11 rate = 90%
-d=5.0: user = 5.00 system = 0.00 elapsed = 5.54 rate = 90%
-d=10.0: user = 10.00 system = 0.00 elapsed = 11.14 rate = 89%

Notice how the tasks' usage rates get closer to the cap the longer thetask runs and never exceeds the cap. With smaller caps the effect isdifferent e.g. for a 9% cap we get:


[peterw@heathwren ~]$ withcap -C 90 ./loops.sh 1
-d=0.001: user = 0.00 system = 0.00 elapsed = 0.01 rate = 109%
-d=0.005: user = 0.01 system = 0.00 elapsed = 0.02 rate = 59%
-d=0.01: user = 0.02 system = 0.00 elapsed = 0.05 rate = 35%
-d=0.05: user = 0.05 system = 0.00 elapsed = 0.14 rate = 42%
-d=0.1: user = 0.10 system = 0.00 elapsed = 0.25 rate = 43%
-d=0.5: user = 0.50 system = 0.00 elapsed = 1.87 rate = 27%
-d=1.0: user = 1.00 system = 0.00 elapsed = 5.37 rate = 18%
-d=5.0: user = 5.00 system = 0.00 elapsed = 48.61 rate = 10%
-d=10.0: user = 10.00 system = 0.00 elapsed = 102.22 rate = 9%

and short lived tasks are being under capped.

Bearing in mind that -d=0.01 is equivalent of a task running for just asingle tick and that that's about the shortest cycle length we're likelyto see for CPU intensive tasks (and then only when the cappingenforcement kicks) I think it is unrealistic to expect much better fortasks with a life shorter than that. Further it takes several cycles togather reasonable statistics to base capping enforcement so I think thatdoing much better than this for short lived tasks is unrealistic.

You could also try using a smaller value for CAP_STATS_OFFSET as thiswill shorten the half life of the Kalman filters and make the cappingreact more quickly to changes in usage rates (which is what a task'sstarting is). The downside is that it will be less smooth.


Peter
--
Peter Williams                                   [email protected]

"Learning, n. The kind of ignorance distinguishing the studious."
 -- Ambrose Bierce

Attachment: loops.sh
Description: application/shellscript

---
 kernel/sched.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

Index: MM-2.6.17-rc5-mm3/kernel/sched.c
===================================================================
--- MM-2.6.17-rc5-mm3.orig/kernel/sched.c	2006-06-06 11:29:51.000000000 +1000
+++ MM-2.6.17-rc5-mm3/kernel/sched.c	2006-06-08 14:28:10.000000000 +1000
@@ -216,7 +216,8 @@ static void sinbin_release_fn(unsigned l
 #define cap_load_weight(p) \
 	(max((int)((min_cpu_rate_cap(p) * SCHED_LOAD_SCALE) / CPU_CAP_ONE), 1))
 #define safe_to_enforce_cap(p) \
-	(!((p)->mutexes_held || (p)->flags & (PF_FREEZE | PF_UIWAKE)))
+	(!((p)->mutexes_held || \
+	   (p)->flags & (PF_FREEZE | PF_UIWAKE | PF_EXITING)))
 #define safe_to_sinbin(p) (safe_to_enforce_cap(p) && !signal_pending(p))
 
 static void init_cpu_rate_caps(task_t *p)
@@ -1235,13 +1236,16 @@ static unsigned long reqd_sinbin_ticks(c
 	unsigned long long rhs = p->avg_cycle_length * p->cpu_rate_hard_cap;
 
 	if (lhs > rhs) {
-		unsigned long res;
-
-		res = static_prio_timeslice(p->static_prio);
-		res *= (CPU_CAP_ONE - p->cpu_rate_hard_cap);
-		res /= CPU_CAP_ONE;
+		lhs -= p->avg_cpu_per_cycle;
+		lhs >>= CAP_STATS_OFFSET;
+		/* have to do two divisions because there's no gaurantee
+		 * that p->cpu_rate_hard_cap * (1000000000 / HZ) would
+		 * not overflow a 32 bit unsigned integer
+		 */
+		(void)do_div(lhs, p->cpu_rate_hard_cap);
+		(void)do_div(lhs, (1000000000 / HZ));
 
-		return res ? : 1;
+		return lhs ? : 1;
 	}
 
 	return 0;

Attachment: asps.sh
Description: application/shellscript

Follow-Ups:
- Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)
  - From: MAEDA Naoaki <[email protected]>
- Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)
  - From: Peter Williams <[email protected]>

References:
- Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)
  - From: MAEDA Naoaki <[email protected]>

Prev by Date: Re: [PATCH] readahead: initial method - expected read size - fix fastcall
Next by Date: Re: [PATCH v2 1/2] iWARP Connection Manager.
Previous by thread: Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)
Next by thread: Re: [ckrm-tech] [RFC 0/4] sched: Add CPU rate caps (improved)
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]