Re: [PATCH 2.6.17-rc6 7/9] Remove some of the kmemleak false positives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13/06/06, Ingo Molnar <[email protected]> wrote:

* Pekka J Enberg <[email protected]> wrote:

> Hi Ingo,
>
> On Mon, 12 Jun 2006, Ingo Molnar wrote:
> > i dont know - i feel uneasy about the 'any pointer' method - it has a
> > high potential for false negatives, especially for structures that
> > contain strings (or other random data), etc.
>
> Is that a problem in practice?  Structures that contain data are
> usually allocated from the slab.  There needs to be a link to that
> struct from the gc roots to get a false negative.  Or am I missing
> something here?

you should think of this in terms of a 'graph of data', where each node
is a block of memory. The edges between nodes are represented by
pointers. The graph roots from .data/bss, but it may go indefinitely
into dynamically allocated blocks as well - just think of a hash-list
where the hash list table is in .data, but all the chain entries are in
allocated blocks and the chaining can be arbitrarily deep.
[...]

Nice description, I should add it to the kmemleak doc :-)

Currently kmemleak does not track the per-block position of 'outgoing
pointers': it assumes that all fields within a block may be an outgoing
pointer. This is a source of false negatives. (fields that do not
contain a real pointer might accidentally contain a value that is
interpreted as a false edge - falsely connecting a leaked block to the
graph.)

That's correct but it might not be a real issue in practice. Many
people use the Boehm's GC and haven't complained about the amount of
false negatives (AFAIK, there is even a proposal to include it in the
next C++ standard).

Kmemleak does recognize 'incoming pointers' via the offsetof tracking
method, but it's limited in that it is not a type-accurate method
either: it tracks per-size offsets, so two types accidentally having the
same size merges their 'possible incoming pointer offset' lists, which
introduces false negatives. (a pointer may be considered an incoming
edge while in reality the pointer is not validly pointing into this
structure)

The number of collisions would need to be investigated. On my system,
there are 158 distinct sizeof values generated by container_of. Of
this, 90 have at least two aliases (incoming pointer offsets). I'm not
sure how many different structures are in the kernel but I can't find
an easy (gcc magic) way to get a unique id for a type (apart from
modifying all the container_of calls and add a 4th argument - maybe a
string with the name of the type).

The full matching that was suggested before would further weaken the
'incoming pointers' logic and would introduce yet another source of
false negatives: we'd match every block pointer against every possible
target address that points to within another block.

That's correct.

My suggestion would be to attempt to achieve perfect matches: annotate
structures to figure out the offset of pointers, and thus to figure out
the precise source addresses and a precise list of valid target
addresses. This is a quite elaborate task to pull off though, and i'm
not sure it's possible without intolerable maintainance overhead, but we
should consider it nevertheless. It will also be _much_ faster, because
per block we'd only have to scan a handful of outgoing pointers.

The problem would be simpler if you get a reliable typeid. If we
consider the sizeof method, a script could scan the kernel and
generate a list of sizeof-pointer_offset pairs which is added to a
radix tree (similar to the aliases tree we have) at boot time. But
this would assume adding the memleak_padding() call not only for
incoming pointers but also for outgoing ones. The method, however,
would eliminate the need for annotating each structure in the kernel.

This also means that by default we'd have no false positives at all,

You can still have a scenario like this - a pointer is freed but the
value remains in a valid member; it is afterwards re-allocated and
leaks but the value is found in a previous allocation. I think it's a
very low risk for this to happen and not worth the hassle.

but
that there is a capable annotation method to reduce the amount of false
negatives, in a gradual and managable way - down to zero if everything
is annotated.

I'm not sure this could be achieved in a maintainable way, at least
not without support from the compiler.

--
Catalin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux