Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats

Shailabh Nagar wrote:

Andrew Morton wrote:
On Fri, 30 Jun 2006 23:37:10 -0400
Shailabh Nagar <[email protected]> wrote:
Set aside the implementation details and ask "what is a good design"?
A kernel-wide constant, whether determined at build-time or by a/proc poke
isn't a nice design.
Can we permit userspace to send in a netlink message describing acpumask? That's back-compatible.
Yes, that should be doable. And passing in a cpumask is much bettersince we no longer
have to maintain mappings.

So the strawman is:
Listener bind()s to genetlink using its real pid.
Sends a separate "registration" message with cpumask to listen to.Kernel stores (real) pid and cpumask.During task exit, kernel goes through each registered listener(small list) and decides whichone needs to get this exit data and calls a genetlink_unicast toeach one that does need it.
If number of listeners is small, the lookups should be swift enough.If it grows large, wecan consider a fancier lookup (but there I go again, delving intoimplementation too early :-)
We'll need a map.

1024 CPUs, 1024 listeners, 1000 exits/sec/CPU and we're up to a million
operations per second per CPU.  Meltdown.
But it's a pretty simple map. A per-cpu array of pointers to thehead of a
linked list.  One lock for each CPU's list.
Here's a patch that implements the above ideas.

A listener register's interest by specifying a cpumask in the
cpulist format (comma separated ranges of cpus). The listener's pid
is entered into per-cpu lists for those cpus and exit events from those
cpus go to the listeners using netlink unicasts.

Please comment.
Andrew, this is not being proposed for inclusion yet since there isatleast one more issue that needs to be resolved:
What happens when a listener exits without doing deregistration
(or if the listener attempts to register another cpumask while a current
registration is still active).

( Jamal, your thoughts on this problem would be appreciated)

Problem is that we have a listener task which has "registered" withtaskstats and causedits pid to be stored in various per-cpu lists of listeners. Later, whensome other task exits on a given cpu, its exit data is sent usinggenlmsg_unicast on each pid present on that cpu's list.

If the listener exits without doing a "deregister", its pid continues tobe kept around, obviously not a good thing. So we need some way ofdetecting the situation (task is no longer listening on

these cpus events) that is efficient.

Two solutions come to mind:

1. During the exit of every task check to see if it is is already"registered" with taskstats. If so, do a cleanup of its pid on variousper-cpu lists.

2. Before doing a genlmsg_unicast to a pid on one of the per-cpu lists(or if genlmsg_unicastfails with a -ECONNREFUSED, a result of netlink_lookup failing for thatpid), then just delete

it from that cpu's list and continue.

1 is more desirable because its the right place to catch this andhappens relatively rarely(few listener exits compared to all exits). However, how can we checkwhether a task/pid

has registered with taskstats earlier ? Again, two possibilities
- Maintain a list of registered listeners within taskstats and check that.

- try to leverage netlink's nl_pid_hash which maintains the same kind ofinfo for each protocol.

Thus a netlink_lookup of the pid would save a lot of work.

However, the netlink layer's hashtable appears to be for the entireNETLINK_GENERICprotocol and not just for the taskstats client of NETLINK_GENERIC. Soeven if a task hasderegistered with taskstats, as long as it has some otherNETLINK_GENERIC socket open,

it will still show up as "connected" as far as netlink is concerned.

Jamal - is my interpretation correct ? Do I need to essentiallyreplicate the pidhash at thetaskstats layer ? Thoughts on whether there's any way genetlink canprovide support for this orwhether its desirable etc. (we appear to be the second user of genetlink- this may not be a

common need going forward).

1 has the disadvantage that if such a situation is detected, one has toiterate over all cpus in

the system, deleting that pid from any per-cpu list it happens to be in.

One could store the cpumask that the listener originally used tooptimize this search. usual tradeoff of storage vs. time.

2 avoids the problem just mentioned since it delegates the task ofcleanup to each cpu at the cost

of incurring an extra check for each listener for each exit on that cpu.

By storing the task_struct instead of the pid in the per-cpu lists, thecheck can be made quite

cheap.

But one problem with 2 is the issue of recycled task_structs and pids.Since the stale task on theper-cpu listener list could have exited a while back, its possible itsalive at the time of the checkand has even registered with a different interest list ! So it'llreceive events it didn't register for.I guess this again calls for us to maintain the listener list withintaskstats explicitly (solution 1)

and explicitly catch the exit of the task/pid.

Thoughts ?

--Shailabh





-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats
  - From: Andrew Morton <[email protected]>

References:
- Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats
  - From: Shailabh Nagar <[email protected]>

Prev by Date: Re: awe64 isa pnp ALSA problems since 2.6.17
Next by Date: Re: Linux SATA Support Question - Is the ULI M1575 chip supported?
Previous by thread: Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats
Next by thread: Re: [Patch][RFC] Disabling per-tgid stats on task exit in taskstats
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]