Re: [NFS] [PATCH 010 of 11] knfsd: make rpc threads pools numa aware

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 31 Jul 2006 15:54:57 +1000
Greg Banks <[email protected]> wrote:

> On Mon, 2006-07-31 at 14:42, Greg Banks wrote:
> > On Mon, 2006-07-31 at 14:36, Neil Brown wrote:
> > > On Sunday July 30, [email protected] wrote:
> > > > On Mon, 31 Jul 2006 10:42:34 +1000
> > > > NeilBrown <[email protected]> wrote:
> > > > 
> > > > > +static int
> > > > > +svc_pool_map_init_percpu(struct svc_pool_map *m)
> > > > > +{
> > > > > +	unsigned int maxpools = num_possible_cpus();
> > > > > +	unsigned int pidx = 0;
> > > > > +	unsigned int cpu;
> > > > > +	int err;
> > > > > +
> > > > > +
> > > > 
> > > > That isn't right - it assumes that cpu_possible_map is not sparse.  If it
> > > > is sparse, we allocate undersized pools and then overindex them.
> > 
> > Umm, I think Andrew's right, num_possible_cpus() should be NR_CPUS.
> 
> How about this version of the patch?  It replaces num_possible_cpus()
> with highest_possible_processor_id()+1 and similarly for nodes.
> --
> 
> knfsd: Actually implement multiple pools.  On NUMA machines, allocate
> a svc_pool per NUMA node; on SMP a svc_pool per CPU; otherwise a single
> global pool.  Enqueue sockets on the svc_pool corresponding to the CPU
> on which the socket bh is run (i.e. the NIC interrupt CPU).  Threads
> have their cpu mask set to limit them to the CPUs in the svc_pool that
> owns them.
> 
> This is the patch that allows an Altix to scale NFS traffic linearly
> beyond 4 CPUs and 4 NICs.
> 
> Incorporates changes and feedback from Neil Brown, Trond Myklebust,
> Christoph Hellwig and Andrew Morton.
> 

Something has gone rather wrong here.

> -	serv = __svc_create(prog, bufsize, shutdown, /*npools*/1);
> +	serv = __svc_create(prog, bufsize, shutdown, npools);

__svc_create() is:

__svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
	   void (*shutdown)(struct svc_serv *serv))

so heaven knows what tree you're patching.

Incremental patches really are preferred.  So we can see what people are
monkeying with ;)

After fixing the rejects and cleaning a few things up, your proposed change
amounts to:

--- a/net/sunrpc/svc.c~knfsd-make-rpc-threads-pools-numa-aware-fix
+++ a/net/sunrpc/svc.c
@@ -116,7 +116,7 @@ fail:
 static int
 svc_pool_map_init_percpu(struct svc_pool_map *m)
 {
-	unsigned int maxpools = num_possible_cpus();
+	unsigned int maxpools = highest_possible_processor_id() + 1;
 	unsigned int pidx = 0;
 	unsigned int cpu;
 	int err;
@@ -136,6 +136,18 @@ svc_pool_map_init_percpu(struct svc_pool
 	return pidx;
 };
 
+static int
+highest_possible_node_id(void)
+{
+	unsigned int node;
+	unsigned int highest = 0;
+
+	for_each_node(node)
+		highest = node;
+
+	return highest;
+}
+
 
 /*
  * Initialise the pool map for SVC_POOL_PERNODE mode.
@@ -144,7 +156,7 @@ svc_pool_map_init_percpu(struct svc_pool
 static int
 svc_pool_map_init_pernode(struct svc_pool_map *m)
 {
-	unsigned int maxpools = num_possible_nodes();
+	unsigned int maxpools = highest_possible_node_id() + 1;
 	unsigned int pidx = 0;
 	unsigned int node;
 	int err;
_


Which shouldn't have compiled, due to the missing forward declaration.  And
I'd be surprised if it worked very well with CONFIG_NUMA=n.  And it's
naughty to be sneaking general library functions into the sunrpc code
anyway.

Please,

- Write a standalone patch which adds highest_possible_node_id() to
  lib/cpumask.c(?)

  Make sure it's inside #ifdef CONFIG_NUMA

  Remember to export it to modules.

  Provide a !CONFIG_NUMA version in include/linux/nodemask.h which just
  returns constant zero.

  Consider doing something more efficient than the for_each_node() loop. 
  Although I'm not sure what that would be, given that we don't have
  find_last_bit().

- Provide an incremental patch against
  knfsd-make-rpc-threads-pools-numa-aware.patch which utilises
  highest_possible_node_id().

  A replacement patch will be grudgingly accepted, but I'll only go and
  turn it into an incremental one, so you can't hide ;)

- Test it real good.  Modular, non-modular, NUMA, non-NUMA, !SMP.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux