Re: rt20 scheduling latency testcase and failure data

On Monday 15 May 2006 01:13, Ingo Molnar wrote:
>
> have you tried to use the latency tracer to capture this latency? It is
> programmable to a high degree. (I've attached trace-it.c that shows how
> to use it)
> ...
> looking at the highlevel traces should give you a quick idea of what's
> going on, and then you can zoom into the time period that triggers the
> long latency. (but feel free to also send these traces to us, preferably
> in bzip2 -9 format.)

Following Ingo's example I have included the modified test case (please see 
the original mail for librt.h) that starts the trace before each sleep and 
disables it after we wake up.  If we have missed a period, we print the 
trace.

My initial trace (lt_01.log.bz2) contained entries like the following:

  <idle>-0     2...1   24us!: default_idle (cpu_idle)
  <idle>-0     2D..1 1609us : smp_apic_timer_interrupt (c0100c80 0 0)

  <idle>-0     2...1 1635us!: default_idle (cpu_idle)
  <idle>-0     2D..1 4756us : smp_apic_timer_interrupt (c0100c80 0 0)

I noticed the that there were long latencies (as high as 3ms) all related to 
irq handling in the idle loop.  I thought it might have something to do with 
the CPU going into some kind of a sleep state and taking a long time to wake 
up.  John Stultz suggested I boot with idle=poll.

The idle=poll trace (lt_02.log.bz2) had much smaller maximum latencies, like 
the following:

  <idle>-0     0D..1 1341us+: write_watchdog_counter (nmi_watchdog_tick)
  <idle>-0     0D..1 1441us : do_nmi (poll_idle)

  <idle>-0     0D..1 1445us+: write_watchdog_counter (nmi_watchdog_tick)
  <idle>-0     0D..1 1545us : do_nmi (poll_idle)

All the longest latencies were related to the nmi watchdogs.  Booting with 
idle=poll nmi_watchdog=0 unfortunately seemed to return to the original 
behavior - so perhaps the second test was just a fluke.  This trace 
(lt_03.log.bz2) had entries like the following:

  <idle>-0     1...1   19us!: poll_idle (cpu_idle)
  <idle>-0     1D..1 1089us : smp_apic_timer_interrupt (c0100ccf 0 0)

  <idle>-0     1...1 1113us!: poll_idle (cpu_idle)
  <idle>-0     1D..1 4765us : smp_apic_timer_interrupt (c0100ccf 0 0)

sched_la-2978  1D... 4798us!: init_fpu (math_state_restore)
sched_la-2978  1D... 5088us : smp_apic_timer_interrupt (4b3b98e8 0 0)

sched_la-2978  1D.h. 5103us!: irq_exit (smp_apic_timer_interrupt)
sched_la-2978  1.... 5719us > sys_clock_gettime (00000001 b7f65378 0000007b)

Finally I booted with idle=poll and nmi_watchdog=0.  The very first run 
failed, (I've noticed that the first iteration seems to always hit PERIOD 
MISSED! while the second usually passes fine).  It's still running a 1M 
iteration run with no more failures so far (100K so far).

The latency tracer is a very interesting tool.  I have a few 
questions/assumptions I'd like to run by you to make sure I understand the 
output of the latency trace:

o ! in the delay column means there is a long latency here?
o + in the delay column means there is a > 1us latency here?
o > means entering the kernel from a sys_call?
o < means returning from the sys_call?
o : is not < or >
o Why do I only see ~5ms of trace when it appears that I am seeing more like
  ~25ms between the time when I start and stop the trace?
o What are the maximum latencies that get printed to /var/log/messages?  What
  exacly do they measure? From what point to what point?
-- 
Darren Hart
IBM Linux Technology Center
Realtime Linux Team

Attachment: lt_01.log.bz2
Description: BZip2 compressed data

Attachment: lt_02.log.bz2
Description: BZip2 compressed data

Attachment: lt_03.log.bz2
Description: BZip2 compressed data

/*    Filename: sched_latency_lkml.c
 *      Author: Darren Hart <[email protected]>
 * Description: Measure the latency involved with periodic scheduling.
 * Compilation: gcc -O2 -g -D_GNU_SOURCE -I/usr/include/nptl -I -L/usr/lib/nptl -lpthread -lrt -lm sched_latency_lkml.c -o sched_latency_lkml
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
 *
 * Copyright (C) IBM Corporation, 2006
 *
 * 2006-May-10:	Initial version by Darren Hart <[email protected]>
 */

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "librt.h"

#define PRIO 98
//#define PERIOD 17*NS_PER_MS
//#define ITERATIONS 100
#define ITERATIONS 10000
#define PERIOD 5*NS_PER_MS
#define PASS_US 100

#define LATENCY_TRACE 1
#ifdef LATENCY_TRACE
#define latency_trace_enable() \
system("echo 0 > /proc/sys/kernel/trace_all_cpus"); \
system("echo 1 > /proc/sys/kernel/trace_enabled"); \
system("echo 0 > /proc/sys/kernel/trace_freerunning"); \
system("echo 0 > /proc/sys/kernel/trace_print_at_crash"); \
system("echo 1 > /proc/sys/kernel/trace_user_triggered"); \
system("echo 0 > /proc/sys/kernel/trace_verbose"); \
system("echo 0 > /proc/sys/kernel/preempt_max_latency"); \
system("echo 0 > /proc/sys/kernel/preempt_thresh"); \
system("[ -e /proc/sys/kernel/wakeup_timing ] && echo 1 > /proc/sys/kernel/wakeup_timing"); \
system("echo 1 > /proc/sys/kernel/mcount_enabled"); 
#define latency_trace_start() gettimeofday(0,1)
#define latency_trace_stop() gettimeofday(0,0)
#define latency_trace_print() system("cat /proc/latency_trace");
#else
#define latency_trace_enable() do { } while (0)
#define latency_trace_start() do { } while (0)
#define latency_trace_stop() do { } while (0)
#define latency_trace_print() do { } while (0)
#endif

nsec_t start;
int retval;

void *periodic_thread(void *arg)
{
	struct thread *t = (struct thread *)arg;
	int i;
	int delay, avg_delay = 0, start_delay, min_delay = 0x7FFFFFF, max_delay = 0;
	int failures = 0;
	nsec_t next=0, now=0, sched_delta=0, delta=0, prev=0;

	retval = 0;

	prev = start;
	next = start;
	now = rt_gettime();
	start_delay = (now - start)/NS_PER_US;

	printf("ITERATION DELAY(US) MAX_DELAY(US) FAILURES\n");
	printf("--------- --------- ------------- --------\n");
	for (i = 1; i <= ITERATIONS; i++) {
		/* wait for the period to start */
		next += PERIOD;
		prev = now;
		now = rt_gettime();

		if (i > 1) latency_trace_stop();

		if (next < now) {
			printf("\n\nPERIOD MISSED!\n");
			printf("     scheduled delta: %8llu us\n", sched_delta/1000);
			printf("        actual delta: %8llu us\n", delta/1000);
			printf("             latency: %8llu us\n", (delta-sched_delta)/1000);
			printf("---------------------------------------\n");
			printf("      previous start: %8llu us\n", (prev-start)/1000);
			printf("                 now: %8llu us\n", (now-start)/1000);
			printf("     scheduled start: %8llu us\n", (next-start)/1000);
			printf("next scheduled start is in the past!\n");
			retval = 1;
			latency_trace_print();
			break;
		}

		latency_trace_start();
		sched_delta = next - now; /* how long we should sleep */
		delta = 0;
		do {
			rt_nanosleep(next - now);
			delta += rt_gettime() - now; /* how long we did sleep */
			now = rt_gettime();
		} while (now < next);

		/* start of period */
		now = rt_gettime();
		delay = (now - start - (nsec_t)i*PERIOD)/NS_PER_US;
		if (delay < min_delay)
			min_delay = delay;
		if (delay > max_delay)
			max_delay = delay;
		if (delay > PASS_US)
			failures++;
		avg_delay += delay;
		

		/* continuous status ticker */
		//printf("%9i %9i %13i %8i\r", i, delay, max_delay, failures);
		//fflush(stdout);
		
		busy_work_ms(1);
	}

	avg_delay /= ITERATIONS;
	printf("\n\n");
	printf("Start Latency: %4d us: %s\n", start_delay, 
		start_delay < PASS_US ? "PASS" : "FAIL");
	printf("Min Latency:   %4d us: %s\n", min_delay,
		min_delay < PASS_US ? "PASS" : "FAIL");
	printf("Avg Latency:   %4d us: %s\n", avg_delay,
		avg_delay < PASS_US ? "PASS" : "FAIL");
	printf("Max Latency:   %4d us: %s\n", max_delay,
		max_delay < PASS_US ? "PASS" : "FAIL");
	printf("Failed Iterations: %d\n\n", failures);

	return NULL;
}

int main(int argc, char *argv[])
{
	int per_id, mas_id;

	printf("-------------------------------\n");
	printf("Scheduling Latency\n");
	printf("-------------------------------\n\n");
	printf("Running %d iterations with a period of %d ms\n", ITERATIONS, PERIOD/NS_PER_MS);
	printf("Expected running time: %d s\n\n", (nsec_t)ITERATIONS*PERIOD/NS_PER_SEC);

	latency_trace_enable();

	start = rt_gettime();
	per_id = create_fifo_thread(periodic_thread, (void*)0, PRIO);

	join_thread(per_id);
	join_threads();

	return retval;
}

Follow-Ups:
- Re: rt20 scheduling latency testcase and failure data
  - From: Sébastien Dugué <[email protected]>
- Re: rt20 scheduling latency testcase and failure data
  - From: Sébastien Dugué <[email protected]>

References:
- rt20 scheduling latency testcase and failure data
  - From: Darren Hart <[email protected]>
- Re: rt20 scheduling latency testcase and failure data
  - From: Darren Hart <[email protected]>
- Re: rt20 scheduling latency testcase and failure data
  - From: Ingo Molnar <[email protected]>

Prev by Date: RE: Regression seen for patch "sched:dont decrease idle sleep avg"
Next by Date: Re: [PATCH 5/6] Have ia64 use add_active_range() and free_area_init_nodes
Previous by thread: Re: rt20 scheduling latency testcase and failure data
Next by thread: Re: rt20 scheduling latency testcase and failure data
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]