Re: Network slowdown due to CFS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* David Schwartz <[email protected]> wrote:

> > > I think the real fix would be for iperf to use blocking network IO 
> > > though, or maybe to use a POSIX mutex or POSIX semaphores.
> >
> > So it's definitely not a bug in the kernel, only in iperf?
> 
> Martin:
> 
> Actually, in this case I think iperf is doing the right thing (though not
> the best thing) and the kernel is doing the wrong thing. [...]

it's not doing the right thing at all. I had a quick look at the source 
code, and the reason for that weird yield usage was that there's a 
locking bug in iperf's "Reporter thread" abstraction and apparently 
instead of fixing the bug it was worked around via a horrible yield() 
based user-space lock.

the (small) patch below fixes the iperf locking bug and removes the 
yield() use. There are numerous immediate benefits of this patch:

 - iperf uses _much_ less CPU time. On my Core2Duo test system, before 
   the patch it used up 100% CPU time to saturate 1 gigabit of network 
   traffic to another box. With the patch applied it now uses 9% of 
   CPU time.

 - sys_sched_yield() is removed altogether

 - i was able to measure much higher bandwidth over localhost for 
   example. This is the case for over-the-network measurements as well.

 - the results are also more consistent and more deterministic, hence 
   more reliable as a benchmarking tool. (the reason for that is that
   more CPU time is spent on actually delivering packets, instead of
   mindlessly polling on the user-space "lock", so we actually max out
   the CPU, instead of relying on the random proportion the workload was
   able to make progress versus wasting CPU time on polling.)

sched_yield() is almost always the symptom of broken locking or other 
bug. In that sense CFS does the right thing by exposing such bugs =B-)
 
	Ingo

------------------------->
Subject: iperf: fix locking
From: Ingo Molnar <[email protected]>

fix iperf locking - it was burning CPU time while polling
unnecessarily, instead of using the proper wait primitives.

Signed-off-by: Ingo Molnar <[email protected]>
---
 compat/Thread.c |    3 ---
 src/Reporter.c  |   13 +++++++++----
 src/main.cpp    |    2 ++
 3 files changed, 11 insertions(+), 7 deletions(-)

Index: iperf-2.0.2/compat/Thread.c
===================================================================
--- iperf-2.0.2.orig/compat/Thread.c
+++ iperf-2.0.2/compat/Thread.c
@@ -405,9 +405,6 @@ int thread_numuserthreads( void ) {
 void thread_rest ( void ) {
 #if defined( HAVE_THREAD )
 #if defined( HAVE_POSIX_THREAD )
-    // TODO add checks for sched_yield or pthread_yield and call that
-    // if available
-    usleep( 0 );
 #else // Win32
     SwitchToThread( );
 #endif
Index: iperf-2.0.2/src/Reporter.c
===================================================================
--- iperf-2.0.2.orig/src/Reporter.c
+++ iperf-2.0.2/src/Reporter.c
@@ -111,6 +111,7 @@ report_statistics multiple_reports[kRepo
 char buffer[64]; // Buffer for printing
 ReportHeader *ReportRoot = NULL;
 extern Condition ReportCond;
+extern Condition ReportDoneCond;
 int reporter_process_report ( ReportHeader *report );
 void process_report ( ReportHeader *report );
 int reporter_handle_packet( ReportHeader *report );
@@ -338,7 +339,7 @@ void ReportPacket( ReportHeader* agent, 
             // item
             while ( index == 0 ) {
                 Condition_Signal( &ReportCond );
-                thread_rest();
+                Condition_Wait( &ReportDoneCond );
                 index = agent->reporterindex;
             }
             agent->agentindex = 0;
@@ -346,7 +347,7 @@ void ReportPacket( ReportHeader* agent, 
         // Need to make sure that reporter is not about to be "lapped"
         while ( index - 1 == agent->agentindex ) {
             Condition_Signal( &ReportCond );
-            thread_rest();
+            Condition_Wait( &ReportDoneCond );
             index = agent->reporterindex;
         }
         
@@ -553,6 +554,7 @@ void reporter_spawn( thread_Settings *th
         }
         Condition_Unlock ( ReportCond );
 
+again:
         if ( ReportRoot != NULL ) {
             ReportHeader *temp = ReportRoot;
             //Condition_Unlock ( ReportCond );
@@ -575,9 +577,12 @@ void reporter_spawn( thread_Settings *th
                 // finished with report so free it
                 free( temp );
                 Condition_Unlock ( ReportCond );
+            	Condition_Signal( &ReportDoneCond );
+		if (ReportRoot)
+			goto again;
             }
-            // yield control of CPU is another thread is waiting
-            thread_rest();
+            Condition_Signal( &ReportDoneCond );
+            usleep(10000);
         } else {
             //Condition_Unlock ( ReportCond );
         }
Index: iperf-2.0.2/src/main.cpp
===================================================================
--- iperf-2.0.2.orig/src/main.cpp
+++ iperf-2.0.2/src/main.cpp
@@ -96,6 +96,7 @@ extern "C" {
     // records being accessed in a report and also to
     // serialize modification of the report list
     Condition ReportCond;
+    Condition ReportDoneCond;
 }
 
 // global variables only accessed within this file
@@ -141,6 +142,7 @@ int main( int argc, char **argv ) {
 
     // Initialize global mutexes and conditions
     Condition_Initialize ( &ReportCond );
+    Condition_Initialize ( &ReportDoneCond );
     Mutex_Initialize( &groupCond );
     Mutex_Initialize( &clients_mutex );
 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux