Re: [RFC] NUMA futex hashing

>>>>> "Ravikiran" == Ravikiran G Thirumalai <[email protected]> writes:

Ravikiran> Current futex hash scheme is not the best for NUMA.  The
Ravikiran> futex hash table is an array of struct futex_hash_bucket,
Ravikiran> which is just a spinlock and a list_head -- this means
Ravikiran> multiple spinlocks on the same cacheline and on NUMA
Ravikiran> machines, on the same internode cacheline.  If futexes of
Ravikiran> two unrelated threads running on two different nodes happen
Ravikiran> to hash onto adjacent hash buckets, or buckets on the same
Ravikiran> internode cacheline, then we have the internode cacheline
Ravikiran> bouncing between nodes.

Ravikiran,

Using that argument, all you need to do is to add the alignment
____cacheline_aligned_in_smp to the definition of
struct futex_hash_bucket and the problem is solved, given that the
internode cacheline in a NUMA system is defined to be the same as the
SMP cacheline size.

Ravikiran> Here is a simple scheme which maintains per-node hash
Ravikiran> tables for futexes.

Ravikiran> In this scheme, a private futex is assigned to the node id
Ravikiran> of the futex's KVA.  The reasoning is, the futex KVA is
Ravikiran> allocated from the node as indicated by memory policy set
Ravikiran> by the process, and that should be a good 'home node' for
Ravikiran> that futex.  Of course this helps workloads where all the
Ravikiran> threads of a process are bound to the same node, but it
Ravikiran> seems reasonable to run all threads of a process on the
Ravikiran> same node.

You can't make that assumption at all. In many NUMA workloads it is
not common to have all threads of a process run on the same node. You
often see a case where one thread spawns a number of threads that are
then grouped onto the various nodes.

If we want to make the futexes really NUMA aware, having them
explicitly allocated on a given node would be more useful or
alternatively have them allocated on a first touch basis.

But to be honest, I doubt it matters too much since the futex
cacheline is most likely to end up in cache on the node where it's
being used and as long as the other nodes don't try and touch the same
futex this becomes a non-issue with the proper alignment.

I don't think your patch is harmful, but it looks awfully complex for
something that could be solved just as well by a simple alignment
statement.

Cheers,
Jes
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [RFC] NUMA futex hashing
  - From: Andi Kleen <[email protected]>

References:
- [RFC] NUMA futex hashing
  - From: Ravikiran G Thirumalai <[email protected]>

Prev by Date: Re: Is XFS trustworthy in the latest 2.6.16
Next by Date: modifying degraded raid 1 then re-adding other members is bad
Previous by thread: Re: [RFC] NUMA futex hashing
Next by thread: Re: [RFC] NUMA futex hashing
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]