Re: [openib-general] [PATCH v2 1/2] iWARP Connection Manager.

On Tue, 2006-06-13 at 16:46 -0500, Steve Wise wrote:
> On Tue, 2006-06-13 at 14:36 -0700, Sean Hefty wrote:
> > >> Er...no. It will lose this event. Depending on the event...the carnage
> > >> varies. We'll take a look at this.
> > >>
> > >
> > >This behavior is consistent with the Infiniband CM (see
> > >drivers/infiniband/core/cm.c function cm_recv_handler()).  But I think
> > >we should at least log an error because a lost event will usually stall
> > >the rdma connection.
> > 
> > I believe that there's a difference here.  For the Infiniband CM, an allocation
> > error behaves the same as if the received MAD were lost or dropped.  Since MADs
> > are unreliable anyway, it's not so much that an IB CM event gets lost, as it
> > doesn't ever occur.  A remote CM should retry the send, which hopefully allows
> > the connection to make forward progress.
> > 
> 
> hmm.  Ok.  I see.  I misunderstood the code in cm_recv_handler().
> 
> Tom and I have been talking about what we can do to not drop the event.
> Stay tuned.

Here's a simple solution that solves the problem:  

For any given cm_id, there are a finite (and small) number of
outstanding CM events that can be posted.  So we just pre-allocate them
when the cm_id is created and keep them on a free list hanging off of
the cm_id struct.  Then the event handler function will pull from this
free list.  

The only case where there is any non-finite issue is on the passive
listening cm_id.  Each incoming connection request will consume a work
struct.  So based on client connects, we could run out of work structs.
However, the CMA has the concept of a backlog, which is defined as the
max number of pending unaccepted connection requests.  So we allocate
these work structs based on that number (or a computation based on that
number), and if we run out, we simply drop the incoming connection
request due to backlog overflow (I suggest we log the drop event too).
When a MPA connection request is dropped, the (IETF conforming) MPA
client will eventually time out the connection and the consumer can
retry.

Comments?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

References:
- RE: [openib-general] [PATCH v2 1/2] iWARP Connection Manager.
  - From: "Sean Hefty" <[email protected]>
- RE: [openib-general] [PATCH v2 1/2] iWARP Connection Manager.
  - From: Steve Wise <[email protected]>

Prev by Date: Re: light weight counters: race free through local_t?
Next by Date: [PATCH] fix clockevents compile error
Previous by thread: RE: [openib-general] [PATCH v2 1/2] iWARP Connection Manager.
Next by thread: [BUG -rt] softlockup via do_page_fault/kunmap_high
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]