bind_zonelist() - are we definitely sizing this correctly?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


I was looking closer at bind_zonelist() and it has the following snippet

        struct zonelist *zl;
        int num, max, nd;
        enum zone_type k;

        max = 1 + MAX_NR_ZONES * nodes_weight(*nodes);
        max++;                  /* space for zlcache_ptr (see mmzone.h) */
        zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL);
        if (!zl)
                return ERR_PTR(-ENOMEM);

That set off alarm bells because we are allocating based on the size of a
zone, not the size of the zonelist.

This is the definition of struct zonelist

struct zonelist {
        struct zonelist_cache *zlcache_ptr;                  // NULL or &zlcache
        struct zone *zones[MAX_ZONES_PER_ZONELIST + 1];      // NULL delimited
        struct zonelist_cache zlcache;                       // optional ...

Important thing to note here is that zlcache is after *zones and it is
not a pointer. zlcache in turn is defined as

struct zonelist_cache {
        unsigned short z_to_n[MAX_ZONES_PER_ZONELIST];          /* zone->nid */
        DECLARE_BITMAP(fullzones, MAX_ZONES_PER_ZONELIST);      /* zone full? */
        unsigned long last_full_zap;            /* when last zap'd (jiffies) */

This is on NUMA only and it's a big structure.

The intention of bind_zonelist() appears to be that we only allocate enough
memory to hold all the zones in the active nodes. This was fine in 2.6.19
but now with zlcache after *zones[], I think we are in danger of allocating
too little memory and any reading of zlcache may be reading randomness when
MPOL_BIND is in use because it will be using the full offset within the
structure whether the memory is allocated or not.

At the risk of sounding stupid, what obvious thing am I missing that makes
this work?

If I'm right and this is broken and we still want to allocate as little memory
as possible, zlcache has to move before zones and the call to kmalloc needs
to take the size of zlcache into account.

Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at
Please read the FAQ at

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux