On Monday 15 May 2006 01:13, Ingo Molnar wrote: > > have you tried to use the latency tracer to capture this latency? It is > programmable to a high degree. (I've attached trace-it.c that shows how > to use it) > ... > looking at the highlevel traces should give you a quick idea of what's > going on, and then you can zoom into the time period that triggers the > long latency. (but feel free to also send these traces to us, preferably > in bzip2 -9 format.) Following Ingo's example I have included the modified test case (please see the original mail for librt.h) that starts the trace before each sleep and disables it after we wake up. If we have missed a period, we print the trace. My initial trace (lt_01.log.bz2) contained entries like the following: <idle>-0 2...1 24us!: default_idle (cpu_idle) <idle>-0 2D..1 1609us : smp_apic_timer_interrupt (c0100c80 0 0) <idle>-0 2...1 1635us!: default_idle (cpu_idle) <idle>-0 2D..1 4756us : smp_apic_timer_interrupt (c0100c80 0 0) I noticed the that there were long latencies (as high as 3ms) all related to irq handling in the idle loop. I thought it might have something to do with the CPU going into some kind of a sleep state and taking a long time to wake up. John Stultz suggested I boot with idle=poll. The idle=poll trace (lt_02.log.bz2) had much smaller maximum latencies, like the following: <idle>-0 0D..1 1341us+: write_watchdog_counter (nmi_watchdog_tick) <idle>-0 0D..1 1441us : do_nmi (poll_idle) <idle>-0 0D..1 1445us+: write_watchdog_counter (nmi_watchdog_tick) <idle>-0 0D..1 1545us : do_nmi (poll_idle) All the longest latencies were related to the nmi watchdogs. Booting with idle=poll nmi_watchdog=0 unfortunately seemed to return to the original behavior - so perhaps the second test was just a fluke. This trace (lt_03.log.bz2) had entries like the following: <idle>-0 1...1 19us!: poll_idle (cpu_idle) <idle>-0 1D..1 1089us : smp_apic_timer_interrupt (c0100ccf 0 0) <idle>-0 1...1 1113us!: poll_idle (cpu_idle) <idle>-0 1D..1 4765us : smp_apic_timer_interrupt (c0100ccf 0 0) sched_la-2978 1D... 4798us!: init_fpu (math_state_restore) sched_la-2978 1D... 5088us : smp_apic_timer_interrupt (4b3b98e8 0 0) sched_la-2978 1D.h. 5103us!: irq_exit (smp_apic_timer_interrupt) sched_la-2978 1.... 5719us > sys_clock_gettime (00000001 b7f65378 0000007b) Finally I booted with idle=poll and nmi_watchdog=0. The very first run failed, (I've noticed that the first iteration seems to always hit PERIOD MISSED! while the second usually passes fine). It's still running a 1M iteration run with no more failures so far (100K so far). The latency tracer is a very interesting tool. I have a few questions/assumptions I'd like to run by you to make sure I understand the output of the latency trace: o ! in the delay column means there is a long latency here? o + in the delay column means there is a > 1us latency here? o > means entering the kernel from a sys_call? o < means returning from the sys_call? o : is not < or > o Why do I only see ~5ms of trace when it appears that I am seeing more like ~25ms between the time when I start and stop the trace? o What are the maximum latencies that get printed to /var/log/messages? What exacly do they measure? From what point to what point? -- Darren Hart IBM Linux Technology Center Realtime Linux Team
Attachment:
lt_01.log.bz2
Description: BZip2 compressed data
Attachment:
lt_02.log.bz2
Description: BZip2 compressed data
Attachment:
lt_03.log.bz2
Description: BZip2 compressed data
/* Filename: sched_latency_lkml.c * Author: Darren Hart <[email protected]> * Description: Measure the latency involved with periodic scheduling. * Compilation: gcc -O2 -g -D_GNU_SOURCE -I/usr/include/nptl -I -L/usr/lib/nptl -lpthread -lrt -lm sched_latency_lkml.c -o sched_latency_lkml * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * * Copyright (C) IBM Corporation, 2006 * * 2006-May-10: Initial version by Darren Hart <[email protected]> */ #include <stdio.h> #include <stdlib.h> #include <math.h> #include "librt.h" #define PRIO 98 //#define PERIOD 17*NS_PER_MS //#define ITERATIONS 100 #define ITERATIONS 10000 #define PERIOD 5*NS_PER_MS #define PASS_US 100 #define LATENCY_TRACE 1 #ifdef LATENCY_TRACE #define latency_trace_enable() \ system("echo 0 > /proc/sys/kernel/trace_all_cpus"); \ system("echo 1 > /proc/sys/kernel/trace_enabled"); \ system("echo 0 > /proc/sys/kernel/trace_freerunning"); \ system("echo 0 > /proc/sys/kernel/trace_print_at_crash"); \ system("echo 1 > /proc/sys/kernel/trace_user_triggered"); \ system("echo 0 > /proc/sys/kernel/trace_verbose"); \ system("echo 0 > /proc/sys/kernel/preempt_max_latency"); \ system("echo 0 > /proc/sys/kernel/preempt_thresh"); \ system("[ -e /proc/sys/kernel/wakeup_timing ] && echo 1 > /proc/sys/kernel/wakeup_timing"); \ system("echo 1 > /proc/sys/kernel/mcount_enabled"); #define latency_trace_start() gettimeofday(0,1) #define latency_trace_stop() gettimeofday(0,0) #define latency_trace_print() system("cat /proc/latency_trace"); #else #define latency_trace_enable() do { } while (0) #define latency_trace_start() do { } while (0) #define latency_trace_stop() do { } while (0) #define latency_trace_print() do { } while (0) #endif nsec_t start; int retval; void *periodic_thread(void *arg) { struct thread *t = (struct thread *)arg; int i; int delay, avg_delay = 0, start_delay, min_delay = 0x7FFFFFF, max_delay = 0; int failures = 0; nsec_t next=0, now=0, sched_delta=0, delta=0, prev=0; retval = 0; prev = start; next = start; now = rt_gettime(); start_delay = (now - start)/NS_PER_US; printf("ITERATION DELAY(US) MAX_DELAY(US) FAILURES\n"); printf("--------- --------- ------------- --------\n"); for (i = 1; i <= ITERATIONS; i++) { /* wait for the period to start */ next += PERIOD; prev = now; now = rt_gettime(); if (i > 1) latency_trace_stop(); if (next < now) { printf("\n\nPERIOD MISSED!\n"); printf(" scheduled delta: %8llu us\n", sched_delta/1000); printf(" actual delta: %8llu us\n", delta/1000); printf(" latency: %8llu us\n", (delta-sched_delta)/1000); printf("---------------------------------------\n"); printf(" previous start: %8llu us\n", (prev-start)/1000); printf(" now: %8llu us\n", (now-start)/1000); printf(" scheduled start: %8llu us\n", (next-start)/1000); printf("next scheduled start is in the past!\n"); retval = 1; latency_trace_print(); break; } latency_trace_start(); sched_delta = next - now; /* how long we should sleep */ delta = 0; do { rt_nanosleep(next - now); delta += rt_gettime() - now; /* how long we did sleep */ now = rt_gettime(); } while (now < next); /* start of period */ now = rt_gettime(); delay = (now - start - (nsec_t)i*PERIOD)/NS_PER_US; if (delay < min_delay) min_delay = delay; if (delay > max_delay) max_delay = delay; if (delay > PASS_US) failures++; avg_delay += delay; /* continuous status ticker */ //printf("%9i %9i %13i %8i\r", i, delay, max_delay, failures); //fflush(stdout); busy_work_ms(1); } avg_delay /= ITERATIONS; printf("\n\n"); printf("Start Latency: %4d us: %s\n", start_delay, start_delay < PASS_US ? "PASS" : "FAIL"); printf("Min Latency: %4d us: %s\n", min_delay, min_delay < PASS_US ? "PASS" : "FAIL"); printf("Avg Latency: %4d us: %s\n", avg_delay, avg_delay < PASS_US ? "PASS" : "FAIL"); printf("Max Latency: %4d us: %s\n", max_delay, max_delay < PASS_US ? "PASS" : "FAIL"); printf("Failed Iterations: %d\n\n", failures); return NULL; } int main(int argc, char *argv[]) { int per_id, mas_id; printf("-------------------------------\n"); printf("Scheduling Latency\n"); printf("-------------------------------\n\n"); printf("Running %d iterations with a period of %d ms\n", ITERATIONS, PERIOD/NS_PER_MS); printf("Expected running time: %d s\n\n", (nsec_t)ITERATIONS*PERIOD/NS_PER_SEC); latency_trace_enable(); start = rt_gettime(); per_id = create_fifo_thread(periodic_thread, (void*)0, PRIO); join_thread(per_id); join_threads(); return retval; }
- Follow-Ups:
- Re: rt20 scheduling latency testcase and failure data
- From: Sébastien Dugué <[email protected]>
- Re: rt20 scheduling latency testcase and failure data
- From: Sébastien Dugué <[email protected]>
- Re: rt20 scheduling latency testcase and failure data
- References:
- rt20 scheduling latency testcase and failure data
- From: Darren Hart <[email protected]>
- Re: rt20 scheduling latency testcase and failure data
- From: Darren Hart <[email protected]>
- Re: rt20 scheduling latency testcase and failure data
- From: Ingo Molnar <[email protected]>
- rt20 scheduling latency testcase and failure data
- Prev by Date: RE: Regression seen for patch "sched:dont decrease idle sleep avg"
- Next by Date: Re: [PATCH 5/6] Have ia64 use add_active_range() and free_area_init_nodes
- Previous by thread: Re: rt20 scheduling latency testcase and failure data
- Next by thread: Re: rt20 scheduling latency testcase and failure data
- Index(es):