Re: [PATCH] dst numa: Avoid dst counter cacheline bouncing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Christoph Lameter <[email protected]>
Date: Thu, 23 Jun 2005 20:10:06 -0700 (PDT)

> AIM7 results (tcp_test and udp_test) on i386 (IBM x445 8p 16G):
> 
> No patch	94788.2
> w/patch		97739.2 (+3.11%)
> 
> The numbers will be much higher on larger NUMA machines.

How much higher?  I don't believe it.  And %3 doesn't justify the
complexity (both in time and memory usage) added by this patch.

Performance of our stack's routing cache is _DEEPLY_ tied to the
size of the routing cache entries, of which dst entry is a member.
Every single byte counts.

You've exploded this to be (NR_NODES * sizeof(void *)) larger.  That
is totally unacceptable, even for NUMA.

Secondly, inlining the "for_each_online_node()" loops isn't very nice
either.

Consider making a per-node routing cache instead, just like the flow
cache is per-cpu, or make socket dst entries have a per-node array of
object pointers.  Fill the per-node array in lazily, just as you do
for the dst.  The first time you try to clone a dst on a cpu for a
socket, create the per-cpu entry slot.

We don't need to make them per-node system wide, only per-socket is
this really needed.

This way you do per-node walking when you detach the dst from the
socket at close() time, not at every dst_release() call, and thus for
every packet in the system.

In light of that, I don't see what the advantage is.  Instead of
atomic inc/dec on every packet sent in the system, you walk the whole
array of counters for every packet sent in the system.  If you really
get contention amongst nodes for a DST entry, this walk should result
in ReadToShare transactions, and thus cacheline movement, between the
NUMA nodes, on every __kfree_skb() call.

Essentially you're trading 1 atomic inc (ReadToOwn) and 1 atomic dec
(ReadToOwn) per packet for significant extra memory, much bigger code,
and 1 ReadToShare transaction per packet.

And since you're still using atomic_inc/atomic_dec you'll still hit
ReadToOwn transactions within the node.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux