Re: [RFC][PATCH] VM deadlock prevention core -v3

> > > Tricky, but since you are using own allocator here, you could change it to
> > > be not so aggressive - i.e. do not round size to number of pages.
> > 
> > I'm not sure I follow you, I'm explicitly using
> > alloc_pages()/free_page(), if
> > I were to go smart here, I'd loose the whole reason for doing so.
> 
> You can use page to put there several skbs for example or at least add
> there a fclone (fast clone).

fclone support is there.

> > > > +struct sk_buff *__netdev_alloc_skb(struct net_device *dev,
> > > > +		unsigned length, gfp_t gfp_mask)
> > > > +{
> > > > +	struct sk_buff *skb;
> > > > +
> > > > +	WARN_ON(gfp_mask & (__GFP_NOMEMALLOC | __GFP_MEMALLOC));
> > > > +	gfp_mask &= ~(__GFP_NOMEMALLOC | __GFP_MEMALLOC);
> > > > +
> > > > +	skb = ___netdev_alloc_skb(dev, length, gfp_mask | __GFP_NOMEMALLOC);
> > > > +	if (skb)
> > > > +		goto done;
> > > > +
> > > > +	if (atomic_read(&dev->rx_reserve_used) >=
> > > > +			dev->rx_reserve * dev->memalloc_socks)
> > > > +		goto out;
> > > > +
> > > > +	/*
> > > > +	 * pre-inc guards against a race with netdev_wait_memalloc()
> > > > +	 */
> > > > +	atomic_inc(&dev->rx_reserve_used);
> > > > +	skb = ___netdev_alloc_skb(dev, length, gfp_mask | __GFP_MEMALLOC);
> > > > +	if (unlikely(!skb)) {
> > > > +		atomic_dec(&dev->rx_reserve_used);
> > > > +		goto out;
> > > > +	}
> > > 
> > > Since you have added atomic operation in that path, you can use device's
> > > reference counter instead and do not care that it can dissapear.
> > 
> > Is that the sole reason taking a reference on the device is bad?
> 
> Taking a reference is bad due to performance reasons, since atomic
> increment is not that cheap. If you do it for one variable for the
> purpose of reference counting you can use device's refcnt istead, which
> will solve some races.

Yes, I understand you. However I'm not sure if performance is the only
reason not to take a refcount on the device. Anyway, I think I have just
been convinced to abandon the per device thing and go global.

> > > > @@ -434,6 +567,12 @@ struct sk_buff *skb_clone(struct sk_buff
> > > >  		n->fclone = SKB_FCLONE_CLONE;
> > > >  		atomic_inc(fclone_ref);
> > > >  	} else {
> > > > +		/*
> > > > +		 * should we special-case skb->memalloc cloning?
> > > > +		 * for now fudge it by forcing fast-clone alloc.
> > > > +		 */
> > > > +		BUG_ON(skb->memalloc);
> > > > +
> > > >  		n = kmem_cache_alloc(skbuff_head_cache, gfp_mask);
> > > >  		if (!n)
> > > >  			return NULL;
> > > 
> > > Ugh... cloning is a one of the shoulders of giant where Linux network
> > > stack is staying...
> > 
> > Yes, I'm aware of that, I have a plan to fix this, however I haven't had
> > time
> > to implement it. My immediate concern is the point wrt. the net_device
> > mapping.
> > 
> > My idea was: instead of the order, store the size, and allocate clone 
> > skbuffs in the available room at the end of the page(s), allocating
> > extra pages
> > if needed.
> 
> You can check if requested skb with fclone fits allocated pages, and if
> so use fclone magic, otherwise postpone clone allocation until it is
> required.

Yes the fclone magic works, however that will only let you have one
clone.
I'm just not confident no receive path will ever exceed that.

> Sockets can live without network devices at all, I expect it is enough
> to clean up in socket destructor, since packets can come from
> different devices into the same socket.

You are right if the reserve wasn't device bound - which I will abandon 
because you are right that with multi-path routing, bridge device and 
other advanced goodies this scheme is broken in that there is no
unambiguous
mapping from sockets to devices.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [RFC][PATCH] VM deadlock prevention core -v3
  - From: "Indan Zupancic" <[email protected]>

References:
- [RFC][PATCH] VM deadlock prevention core -v3
  - From: Peter Zijlstra <[email protected]>
- Re: [RFC][PATCH] VM deadlock prevention core -v3
  - From: Evgeniy Polyakov <[email protected]>
- Re: [RFC][PATCH] VM deadlock prevention core -v3
  - From: Peter Zijlstra <[email protected]>
- Re: [RFC][PATCH] VM deadlock prevention core -v3
  - From: Evgeniy Polyakov <[email protected]>

Prev by Date: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem
Next by Date: Re: Urgent help needed on an NFS question, please help!!!
Previous by thread: Re: [RFC][PATCH] VM deadlock prevention core -v3
Next by thread: Re: [RFC][PATCH] VM deadlock prevention core -v3
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]