[resend. Did not reach mailing lists, most probably due
to KMail's unstoppable desire to use base64 encoding :)]
Hi folks,
I reported earlied that around linux-2.6.11-rc5 my home box sometimes
does not want to send anything over ethetnet. That report is repeated below
sig.
I finally managed to nail down where this happens.
I instrumented sch_generic.c to trace what happens with packets
to be sent over interface named "if".
On 'good' boot, I see
2005-07-12_17:26:29.72158 kern.info: qdisc_restart: start
2005-07-12_17:26:29.72164 kern.info: qdisc_restart: skb!=NULL
2005-07-12_17:26:29.72166 kern.info: qdisc_restart: if !netif_queue_stopped...
2005-07-12_17:26:29.72167 kern.info: qdisc_restart: ...hard_start_xmit
in the log, on 'bad' one only "qdisc_restart: start".
Below is first report and instrumented part of sch_generic.c.
--
vda
Subject: linux-2.6.11-rc5: mysterious loss of tx
My home box has onboard via-rhine NIC.
Several days ago my father called me and said that
it does not send anything (tcpdump shows only rx'ed pkts
despite pings being attempted etc). I did not investigate
then.
Yesterday I've seen it myself. I bumped up ethtool msglvl.
Looks like via-rhine's hard_start_xmit was not called at all
from network core code! (I did not see debug printks from
rhine's hard_stat_xmit routine)
Whatever I tried (ifconfig down/up, reinit IP config from scratch),
nothing helped. No tx whatsoever was attempted by kernel, it seems.
Reboot 'fixed' things.
It hever happened on the same hardware before I switched to rc5.
int qdisc_restart(struct net_device *dev)
{
struct Qdisc *q = dev->qdisc;
struct sk_buff *skb;
int track = (dev->name[0]=='i' && dev->name[1]=='f' && dev->name[2]=='\0');
//'via rhine bug':
//I see ONLY "qdisc_restart: start",
//but not any of below msgs.
//On 'good' boots, it looks like this:
//...
//2005-07-12_17:26:29.72158 kern.info: qdisc_restart: start
//2005-07-12_17:26:29.72164 kern.info: qdisc_restart: skb!=NULL
//2005-07-12_17:26:29.72166 kern.info: qdisc_restart: if !netif_queue_stopped...
//2005-07-12_17:26:29.72167 kern.info: qdisc_restart: ...hard_start_xmit
//...
if(track) { printk("qdisc_restart: start\n"); }
/* Dequeue packet */
if ((skb = q->dequeue(q)) != NULL) {
if(track) { printk("qdisc_restart: skb!=NULL\n"); }
unsigned nolock = (dev->features & NETIF_F_LLTX);
/*
* When the driver has LLTX set it does its own locking
* in start_xmit. No need to add additional overhead by
* locking again. These checks are worth it because
* even uncongested locks can be quite expensive.
* The driver can do trylock like here too, in case
* of lock congestion it should return -1 and the packet
* will be requeued.
*/
if (!nolock) {
if (!spin_trylock(&dev->xmit_lock)) {
collision:
if(track) { printk("qdisc_restart: collision\n"); }
/* So, someone grabbed the driver. */
/* It may be transient configuration error,
when hard_start_xmit() recurses. We detect
it by checking xmit owner and drop the
packet when deadloop is detected.
*/
if (dev->xmit_lock_owner == smp_processor_id()) {
kfree_skb(skb);
if (net_ratelimit())
printk(KERN_DEBUG "Dead loop on netdevice %s, fix it urgently!\n", dev->name);
return -1;
}
__get_cpu_var(netdev_rx_stat).cpu_collision++;
goto requeue;
}
/* Remember that the driver is grabbed by us. */
dev->xmit_lock_owner = smp_processor_id();
}
{
/* And release queue */
spin_unlock(&dev->queue_lock);
//vda
if(track) { printk("qdisc_restart: if !netif_queue_stopped...\n"); }
if (!netif_queue_stopped(dev)) {
int ret;
if (netdev_nit)
dev_queue_xmit_nit(skb, dev);
if(track) { printk("qdisc_restart: ...hard_start_xmit\n"); }
ret = dev->hard_start_xmit(skb, dev);
if (ret == NETDEV_TX_OK) {
if (!nolock) {
dev->xmit_lock_owner = -1;
spin_unlock(&dev->xmit_lock);
}
spin_lock(&dev->queue_lock);
return -1;
}
if (ret == NETDEV_TX_LOCKED && nolock) {
spin_lock(&dev->queue_lock);
goto collision;
}
}
/* NETDEV_TX_BUSY - we need to requeue */
/* Release the driver */
if (!nolock) {
dev->xmit_lock_owner = -1;
spin_unlock(&dev->xmit_lock);
}
spin_lock(&dev->queue_lock);
q = dev->qdisc;
}
/* Device kicked us out :(
This is possible in three cases:
0. driver is locked
1. fastroute is enabled
2. device cannot determine busy state
before start of transmission (f.e. dialout)
3. device is buggy (ppp)
*/
requeue:
q->ops->requeue(skb, q);
netif_schedule(dev);
return 1;
}
BUG_ON((int) q->q.qlen < 0);
return q->q.qlen;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
|
|