Re: bad networking related lag in v2.6.22-rc2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Anant Nitya <[email protected]> wrote:

> > could you also apply the fix for the softirq problem below, to make 
> > sure it does not interact?

> Above patch does solve __ soft_irq_pending __ problem. I am running 
> this patch with kernel 2.6.21.1 since last day doing all kinda things 
> but haven't encountered any __ NOHZ: local_softirq_pending __. But 
> network lag that I am seeing since 2.6.22-rc1 is still there even with 
> this patch applied. If you need any more information please do ask. 
> Meanwhile I will do gitbisect as suggested by linus to find out the 
> specific commit that introduced this problem and will inform once I 
> find it. Its good to see system running without any __ 
> local_softirq_problem __ :)

thanks.

if you feel inclined to try the git-bisection then by all means please 
do it (it will certainly be helpful and educative), but it's optional: i 
dont think you should 'need' to go through extra debugging chores, my 
analysis based on the excellent trace you provided still holds and 
whoever modified htb_dequeue()'s logic recently ought to be able to 
figure that out (or send you a debug patch to further narrow the problem 
down).

The trace shows a _clearly_ anomalous loop: for example there's 56396 
(!) calls to rb_first() in htb_dequeue() [without the kernel ever 
exiting that function]:

  earth4:~/s> grep rb_first trace-to-ingo.txt  | wc -l
  56396

and the set of rules you are using are alot simpler and the networking 
load you are using is not large by any means. Here's the trace analysis 
below again.

	Ingo

----------------------->

> http://cybertek.info/taitai/trace-to-ingo.txt.bz2

This trace indeed includes the smoking gun, htb_dequeue() and 
__qdisc_run():

   privoxy-12926 1.Ns1 1597us : rb_first (htb_dequeue)

this goes on, non-preemptible, for 160 milliseconds (!):

 privoxy-12926 1.Ns1 161568us : rb_first (htb_dequeue)
 privoxy-12926 1.Ns1 161568us : qdisc_watchdog_schedule (htb_dequeue)

and finally manages to escape the loop:

 privoxy-12926 1.Ns1 161597us : rb_first (htb_dequeue)
 privoxy-12926 1.Ns1 161597us : rb_first (htb_dequeue)
 privoxy-12926 1.Ns1 161599us : htb_safe_rb_erase (htb_dequeue)
 privoxy-12926 1.Ns1 161599us : rb_erase (htb_safe_rb_erase)
 privoxy-12926 1.Ns1 161600us : htb_change_class_mode (htb_dequeue)
 privoxy-12926 1.Ns1 161601us : htb_activate_prios (htb_change_class_mode)

and the system recovers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux