parallel networking (was Re: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock)

jamal wrote:

On Sun, 2007-07-10 at 21:51 -0700, David Miller wrote:

For these high performance 10Gbit cards it's a load balancing
function, really, as all of the transmit queues go out to the same
physical port so you could:

1) Load balance on CPU number.
2) Load balance on "flow"
3) Load balance on destination MAC

etc. etc. etc.

The brain-block i am having is the parallelization aspect of it.
Whatever scheme it is - it needs to ensure the scheduler works as
expected. For example, if it was a strict prio scheduler i would expect
that whatever goes out is always high priority first and never ever
allow a low prio packet out at any time theres something high prio
needing to go out. If i have the two priorities running on two cpus,
then i cant guarantee that effect.

Any chance the NIC hardware could provide that guarantee?

8139cp, for example, has two TX DMA rings, with hardcodedcharacteristics: one is a high prio q, and one a low prio q. The logicis pretty simple: empty the high prio q first (potentially starvinglow prio q, in worst case).

In terms of overall parallelization, both for TX as well as RX, my gutfeeling is that we want to move towards an MSI-X, multi-core friendlymodel where packets are LIKELY to be sent and received by the same setof [cpus | cores | packages | nodes] that the [userland] processesdealing with the data.

There are already some primitive NUMA bits in skbuff allocation, butwith modern MSI-X and RX/TX flow hashing we could do a whole lot more,along the lines of better CPU scheduling decisions, directing flows toclusters of cpus, and generally doing a better job of maximizing cacheefficiency in a modern multi-thread environment.

IMO the current model where each NIC's TX completion and RX processesare both locked to the same CPU is outmoded in a multi-core world withmodern NICs. :)

But I readily admit general ignorance about the kernel processscheduling stuff, so my only idea about a starting point was to see howfar to go with the concept of "skb affinity" -- a mask in sk_buff thatis a hint about which cpu(s) on which the NIC should attempt to send andreceive packets. When going through bonding or netfilter, it is trivialto 'or' together affinity masks. All the various layers of net stackshould attempt to honor the skb affinity, where feasible (requiresinteraction with CFS scheduler?).

Or maybe skb affinity is a dumb idea. I wanted to get people thinkingon the bigger picture. Parallelization starts at the user process.

	Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: parallel networking
  - From: David Miller <davem@davemloft.net>
- Re: parallel networking (was Re: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock)
  - From: jamal <hadi@cyberus.ca>

Prev by Date: Re: irqbalance considered obsolete by powertop
Next by Date: Re: [PATCH 0/2] [RFC] RT: Optionally allow IRQF_NODELAY on serial console
Previous by thread: [PATCH] ehea: use kernel event queue
Next by thread: Re: parallel networking (was Re: [PATCH 1/4] [NET_SCHED] explict hold dev tx lock)
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]