Re: [Suspend2][] [Suspend2] Dynamically allocated pageflags

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Tuesday 27 June 2006 01:17, Nigel Cunningham wrote:
> On Tuesday 27 June 2006 07:11, Rafael J. Wysocki wrote:
> > On Monday 26 June 2006 18:52, Nigel Cunningham wrote:
> > > Add support for dynamically allocated pageflags - bitmaps consisting of
> > > single pages merged together into per zone bitmaps that can then be used
> > > as temporary page flags.
> >
> > IIRC, this has already been discussed on LKML and some people argued it
> > would be overkill.  Could you please say why you think it's not so?
> 
> Some people did, but IIRC, Dave Jones liked the general idea. The main 
> rationale was that we have a limited number of pageflags, and this provides a 
> way that we can easily add new ones without expanding struct page. Obviously 
> it's better for usage patterns like suspend2.

Well, I think that might only be useful for "temporary" flags, because if such
a structure were to stay in memory permanently, it would be better to expand
struct page anyway.

> The other (suspend2 specific) advantage is that having them stored like this 
> makes storing the image data much simpler and while we're calculating the 
> contents of the image, there is zero impact on the size of this metadata 
> storage from changes in the pages that are in each pagedir.

The question is if there are any potential users of this, except for suspend2.
If not, it probably would make sense to use a more suspend2-specific and
maybe simpler data structure and give up the generality.

> > Well, having read the entire patch I think I'd do this in a different way.
> > Namely, we can always use pfn_to_page() to get the struct page
> > corresponding to given PFN, so we can enumerate pages from 0 to max_pfn and
> > create a simple bitmap with one or more bits per PFN.  The only difficulty
> > I see wrt this is to make sure 0-order allocations are used for the
> > bitmaps.
> 
> I used to do that, but it arm pfns don't start at zero and it would leave 
> gaps/wastage in the discontig mem case.

Would that be a big problem?  That is, how many bits we would waste eg. on ARM?

> > > Signed-off-by: Nigel Cunningham <[email protected]>
> > >
> > >  include/linux/dyn_pageflags.h |   66 ++++++++
> > >  lib/Kconfig                   |    3
> > >  lib/Makefile                  |    2
> > >  lib/dyn_pageflags.c           |  330
> > > +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 401
> > > insertions(+), 0 deletions(-)
> > >
> > > diff --git a/include/linux/dyn_pageflags.h
> > > b/include/linux/dyn_pageflags.h new file mode 100644
> > > index 0000000..1a7b5d8
> > > --- /dev/null
> > > +++ b/include/linux/dyn_pageflags.h
> > > @@ -0,0 +1,66 @@
> > > +/*
> > > + * include/linux/dyn_pageflags.h
> > > + *
> > > + * Copyright (C) 2004-2006 Nigel Cunningham <[email protected]>
> > > + *
> > > + * This file is released under the GPLv2.
> > > + *
> > > + * It implements support for dynamically allocated bitmaps that are
> > > + * used for temporary or infrequently used pageflags, in lieu of
> > > + * bits in the struct page flags entry.
> > > + */
> > > +
> > > +#ifndef DYN_PAGEFLAGS_H
> > > +#define DYN_PAGEFLAGS_H
> > > +
> > > +#include <linux/mm.h>
> > > +
> > > +typedef unsigned long *** dyn_pageflags_t;
> >
> > Is this really necessary to define the dyn_pageflags_t here?
> 
> This will also be used in Suspend2's kernel/power/pageflags.c, which adds 
> support for saving the pageflags in the image header and reloading and 
> relocating them at resume time.

I would probably use unsigned long *** directly anyway.

> > > +
> > > +#define BITNUMBER(page) (page_to_pfn(page))
> >
> > I think it would be cleaner to use page_to_pfn(page) verbatim instead of
> > this.
> 
> I was going for readability.

Well, if I see BITNUMBER(page), I tend to check what it means, so I have to
browse the code just to find out it's page_to_pfn(page).  Ouch.

> > > +
> > > +#if BITS_PER_LONG == 32
> > > +#define UL_SHIFT 5
> > > +#else
> > > +#if BITS_PER_LONG == 64
> > > +#define UL_SHIFT 6
> > > +#else
> > > +#error Bits per long not 32 or 64?
> > > +#endif
> > > +#endif
> > > +
> > > +#define BIT_NUM_MASK (sizeof(unsigned long) * 8 - 1)
> > > +#define PAGE_NUM_MASK (~((1 << (PAGE_SHIFT + 3)) - 1))
> > > +#define UL_NUM_MASK (~(BIT_NUM_MASK | PAGE_NUM_MASK))
> > > +
> > > +#define BITS_PER_PAGE (PAGE_SIZE << 3)
> > > +#define PAGENUMBER(zone_offset) (zone_offset >> (PAGE_SHIFT + 3))
> > > +#define PAGEINDEX(zone_offset) ((zone_offset & UL_NUM_MASK) >> UL_SHIFT)
> > > +#define PAGEBIT(zone_offset) (zone_offset & BIT_NUM_MASK)
> > > +
> > > +#define PAGE_UL_PTR(bitmap, zone_num, zone_pfn) \
> > > +       ((bitmap[zone_num][PAGENUMBER(zone_pfn)])+PAGEINDEX(zone_pfn))
> >
> > All of the above definitions seem to be related in an opaque way.  Any
> > chance to make them a bit more clearer?  A comment, maybe?
> 
> Ok.
> 
> > > +
> > > +/* With the above macros defined, you can do...
> > > +
> > > +#define PagePageset1(page) (test_dynpageflag(&pageset1_map, page))
> > > +#define SetPagePageset1(page) (set_dynpageflag(&pageset1_map, page))
> > > +#define ClearPagePageset1(page) (clear_dynpageflag(&pageset1_map, page))
> > > +*/
> >
> > I'd choose different names for these.  Also the definitions shouldn't go
> > before test/set/clear_dynpageflag() are defined, IMO.
> 
> I was seeking to make them similar to the real pageflags semantics. Did you 
> notice that this is a comment?

I didn't, sorry.  Anyway it's a bit confusing.

> Will rearrange anyway.

OK

> > > +
> > > +#define BITMAP_FOR_EACH_SET(bitmap, counter) \
> > > +	for (counter = get_next_bit_on(bitmap, -1); counter > -1; \
> > > +		counter = get_next_bit_on(bitmap, counter))
> > > +
> > > +extern void clear_dyn_pageflags(dyn_pageflags_t pagemap);
> > > +extern int allocate_dyn_pageflags(dyn_pageflags_t *pagemap);
> > > +extern void free_dyn_pageflags(dyn_pageflags_t *pagemap);
> > > +extern int dyn_pageflags_pages_per_bitmap(void);
> > > +extern int get_next_bit_on(dyn_pageflags_t bitmap, int counter);
> > > +extern unsigned long *dyn_pageflags_ul_ptr(dyn_pageflags_t *bitmap,
> > > +		struct page *pg);
> > > +
> > > +extern int test_dynpageflag(dyn_pageflags_t *bitmap, struct page *page);
> > > +extern void set_dynpageflag(dyn_pageflags_t *bitmap, struct page *page);
> > > +extern void clear_dynpageflag(dyn_pageflags_t *bitmap, struct page
> > > *page); +#endif
> > > diff --git a/lib/Kconfig b/lib/Kconfig
> > > index 3de9335..5596bd8 100644
> > > --- a/lib/Kconfig
> > > +++ b/lib/Kconfig
> > > @@ -38,6 +38,9 @@ config LIBCRC32C
> > >  	  require M here.  See Castagnoli93.
> > >  	  Module will be libcrc32c.
> > >
> > > +config DYN_PAGEFLAGS
> > > +	bool
> > > +
> > >  #
> > >  # compression support is select'ed if needed
> > >  #
> > > diff --git a/lib/Makefile b/lib/Makefile
> > > index b830c9a..8290e9b 100644
> > > --- a/lib/Makefile
> > > +++ b/lib/Makefile
> > > @@ -31,6 +31,8 @@ ifneq ($(CONFIG_HAVE_DEC_LOCK),y)
> > >    lib-y += dec_and_lock.o
> > >  endif
> > >
> > > +obj-$(CONFIG_DYN_PAGEFLAGS) += dyn_pageflags.o
> > > +
> > >  obj-$(CONFIG_CRC_CCITT)	+= crc-ccitt.o
> > >  obj-$(CONFIG_CRC16)	+= crc16.o
> > >  obj-$(CONFIG_CRC32)	+= crc32.o
> > > diff --git a/lib/dyn_pageflags.c b/lib/dyn_pageflags.c
> > > new file mode 100644
> > > index 0000000..4da52f5
> > > --- /dev/null
> > > +++ b/lib/dyn_pageflags.c
> > > @@ -0,0 +1,330 @@
> > > +/*
> > > + * lib/dyn_pageflags.c
> > > + *
> > > + * Copyright (C) 2004-2006 Nigel Cunningham <[email protected]>
> > > + *
> > > + * This file is released under the GPLv2.
> > > + *
> > > + * Routines for dynamically allocating and releasing bitmaps
> > > + * used as pseudo-pageflags.
> > > + *
> > > + * Arrays are not contiguous. The first sizeof(void *) bytes are
> > > + * the pointer to the next page in the bitmap. This allows us to
> > > + * work under low memory conditions where order 0 might be all
> > > + * that's available. In their original use (suspend2), it also
> > > + * lets us save the pages at suspend time, reload and relocate them
> > > + * as necessary at resume time without much effort.
> > > + *
> > > + */
> > > +
> > > +#include <linux/module.h>
> > > +#include <linux/dyn_pageflags.h>
> > > +#include <linux/bootmem.h>
> > > +
> > > +#define page_to_zone_offset(pg) (page_to_pfn(pg) -
> > > page_zone(pg)->zone_start_pfn) +
> > > +/*
> > > + * num_zones
> > > + *
> > > + * How many zones are there?
> > > + *
> > > + */
> > > +
> > > +static int num_zones(void)
> > > +{
> > > +	int result = 0;
> > > +	struct zone *zone;
> > > +
> > > +	for_each_zone(zone)
> > > +		result++;
> > > +
> > > +	return result;
> > > +}
> >
> > Do we really need this?
> 
> Well, I need to know how many zones there are, and haven't seen a generic 
> function. If you know of one, please point me to it. (Or I could shift this 
> somewhere else and make it a generic function - I'm aware that I have 
> something similar in pageflags.c).

I don't know such a function, but if you need to call the above more than once
it's plain inefficient.  I'd define a variable num_zones instead.

> > > +
> > > +/*
> > > + * pages_for_zone(struct zone *zone)
> > > + *
> > > + * How many pages do we need for a bitmap for this zone?
> > > + *
> > > + */
> > > +
> > > +static int pages_for_zone(struct zone *zone)
> > > +{
> > > +	return (zone->spanned_pages + (PAGE_SIZE << 3) - 1) >>
> > > +			(PAGE_SHIFT + 3);
> > > +}
> >
> > Could you please explain the maths above?
> 
> Sure. Page_size << 3 is the number of bits in a page. We are calculating the 
> number of pages needed to store spanned_pages bits, which is (spanned_pages + 
> bits_per_page - 1) / (bits_per_page - 1). (Rounds up).

Any chance to use DIV_ROUND_UP here?

> > > +
> > > +/*
> > > + * page_zone_number(struct page *page)
> > > + *
> > > + * Which zone index does the page match?
> > > + *
> > > + */
> > > +
> > > +static int page_zone_number(struct page *page)
> > > +{
> > > +	struct zone *zone, *zone_sought = page_zone(page);
> > > +	int zone_num = 0;
> > > +
> > > +	for_each_zone(zone)
> > > +		if (zone == zone_sought)
> > > +			return zone_num;
> > > +		else
> > > +			zone_num++;
> > > +
> > > +	printk("Was looking for a zone for page %p.\n", page);
> > > +	BUG_ON(1);
> > > +
> > > +	return 0;
> > > +}
> >
> > Why can't we use page_zonenum() instead of this?
> 
> They will only be the same thing in the x86 case. If you have multiple zones 
> and some don't have any dma pages or highmem pages, this will give a more 
> compact numbering.

But this is _really_ inefficient, because you call it for every page and for how
many times?  I think it should be fixed.

> > > +
> > > +/*
> > > + * dyn_pageflags_pages_per_bitmap
> > > + *
> > > + * Number of pages needed for a bitmap covering all zones.
> > > + *
> > > + */
> > > +int dyn_pageflags_pages_per_bitmap(void)
> > > +{
> > > +	int total = 0;
> > > +	struct zone *zone;
> > > +
> > > +	for_each_zone(zone)
> > > +		total += pages_for_zone(zone);
> > > +
> > > +	return total;
> > > +}
> >
> > Why is a separate function needed for this?
> 
> Readability. Actually, answering this made me check again, and I couldn't find 
> a user of this function. I'll remove it.
> 
> > > +
> > > +/*
> > > + * clear_dyn_pageflags(dyn_pageflags_t pagemap)
> > > + *
> > > + * Clear an array used to store local page flags.
> > > + *
> > > + */
> > > +
> > > +void clear_dyn_pageflags(dyn_pageflags_t pagemap)
> > > +{
> > > +	int i = 0, zone_num = 0;
> > > +	struct zone *zone;
> > > +
> > > +	BUG_ON(!pagemap);
> > > +
> > > +	for_each_zone(zone) {
> > > +		for (i = 0; i < pages_for_zone(zone); i++)
> > > +			memset((pagemap[zone_num][i]), 0, PAGE_SIZE);
> > > +		zone_num++;
> > > +	}
> > > +}
> >
> > I think this could be done in a simpler way too.
> >
> > > +
> > > +/*
> > > + * allocate_dyn_pageflags(dyn_pageflags_t *pagemap)
> > > + *
> > > + * Allocate a bitmap for dynamic page flags.
> > > + *
> > > + */
> > > +int allocate_dyn_pageflags(dyn_pageflags_t *pagemap)
> > > +{
> > > +	int i, zone_num = 0;
> > > +	struct zone *zone;
> > > +
> > > +	BUG_ON(*pagemap);
> > > +
> > > +	*pagemap = kmalloc(sizeof(void *) * num_zones(), GFP_ATOMIC);
> >
> > By using kmalloc(), you use slab.  I'd rather allocate entire pages.
> 
> You might only be using 20 bytes out of the page. At resume time, I do use 
> entire pages if the kmalloc'd value needs relocating, but only if that's 
> needed.

Which is not exactly straightforward.  IMHO that should be simplified
somehow.

> 
> > > +
> > > +	if (!*pagemap)
> > > +		return -ENOMEM;
> > > +
> > > +	for_each_zone(zone) {
> > > +		int zone_pages = pages_for_zone(zone);
> > > +		(*pagemap)[zone_num] = kmalloc(sizeof(void *) * zone_pages,
> > > +					       GFP_ATOMIC);
> > > +
> > > +		if (!(*pagemap)[zone_num]) {
> > > +			kfree (*pagemap);
> >
> > This seems to leak memory, eg. if *pagemap[0] is allocated, but the
> > allocation of *pagemap[1] fails.  You should free *pagemap[j] up to
> > zone_num-1 and then *pagemap, IMO.
> 
> Yes, I should. Thanks.
> 
> > > +			return -ENOMEM;
> > > +		}
> > > +
> > > +		for (i = 0; i < zone_pages; i++) {
> > > +			unsigned long address = get_zeroed_page(GFP_ATOMIC);
> > > +			(*pagemap)[zone_num][i] = (unsigned long *) address;
> > > +			if (!(*pagemap)[zone_num][i]) {
> > > +				printk("Error. Unable to allocate memory for "
> > > +					"dynamic pageflags.");
> > > +				free_dyn_pageflags(pagemap);
> > > +				return -ENOMEM;
> > > +			}
> > > +		}
> > > +		zone_num++;
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +/*
> > > + * free_dyn_pageflags(dyn_pageflags_t *pagemap)
> > > + *
> > > + * Free a dynamically allocated pageflags bitmap. For Suspend2 usage, we
> > > + * support data being relocated from slab to pages that don't conflict
> > > + * with the image that will be copied back. This is the reason for the
> > > + * PageSlab tests below.
> > > + *
> > > + */
> > > +void free_dyn_pageflags(dyn_pageflags_t *pagemap)
> > > +{
> > > +	int i = 0, zone_num = 0;
> > > +	struct zone *zone;
> > > +
> > > +	if (!*pagemap)
> > > +		return;
> > > +
> > > +	for_each_zone(zone) {
> > > +		int zone_pages = pages_for_zone(zone);
> > > +
> > > +		if (!((*pagemap)[zone_num]))
> > > +			continue;
> > > +		for (i = 0; i < zone_pages; i++)
> > > +			if ((*pagemap)[zone_num][i])
> > > +				free_page((unsigned long) (*pagemap)[zone_num][i]);
> > > +
> > > +		if (PageSlab(virt_to_page((*pagemap)[zone_num])))
> > > +			kfree((*pagemap)[zone_num]);
> > > +		else
> > > +			free_page((unsigned long) (*pagemap)[zone_num]);
> > > +
> > > +		zone_num++;
> > > +	}
> > > +
> > > +	if (PageSlab(virt_to_page((*pagemap))))
> > > +		kfree(*pagemap);
> > > +	else
> > > +		free_page((unsigned long) (*pagemap));
> >
> > Why so?
> 
> See above (Suspend2 relocation at resume time).

Oh well ...

> > > +
> > > +	*pagemap = NULL;
> > > +	return;
> > > +}
> > > +
> > > +/*
> > > + * dyn_pageflags_ul_ptr(dyn_pageflags_t *bitmap, struct page *pg)
> > > + *
> > > + * Get a pointer to the unsigned long containing the flag in the bitmap
> > > + * for the given page.
> > > + *
> > > + */
> > > +
> > > +unsigned long *dyn_pageflags_ul_ptr(dyn_pageflags_t *bitmap, struct page
> > > *pg) +{
> > > +	int zone_pfn = page_to_zone_offset(pg);
> > > +	int zone_num = page_zone_number(pg);
> > > +	int pagenum = PAGENUMBER(zone_pfn);
> > > +	int page_offset = PAGEINDEX(zone_pfn);
> > > +	return ((*bitmap)[zone_num][pagenum]) + page_offset;
> > > +}
> >
> > It doesn't look very efficient, does it? ;-)
> 
> Readability again. I'm assuming the compiler is semi intelligent :)

Well, how is it supposed to optimise out the call to page_zone_number()?
Also I wouldn't bet on the efficiency of the displacement computations.

Moreover, it would be nice to avoid calling page_to_zone_offset(page) for
the second time in test/set/clear_dynpageflag below.

> > > +
> > > +/*
> > > + * test_dynpageflag(dyn_pageflags_t *bitmap, struct page *page)
> > > + *
> > > + * Is the page flagged in the given bitmap?
> > > + *
> > > + */
> > > +
> > > +int test_dynpageflag(dyn_pageflags_t *bitmap, struct page *page)
> > > +{
> > > +	unsigned long *ul = dyn_pageflags_ul_ptr(bitmap, page);
> > > +	int zone_offset = page_to_zone_offset(page);
> > > +	int bit = PAGEBIT(zone_offset);
> > > +
> > > +	return test_bit(bit, ul);
> > > +}
> > > +
> > > +/*
> > > + * set_dynpageflag(dyn_pageflags_t *bitmap, struct page *page)
> > > + *
> > > + * Set the flag for the page in the given bitmap.
> > > + *
> > > + */
> > > +
> > > +void set_dynpageflag(dyn_pageflags_t *bitmap, struct page *page)
> > > +{
> > > +	unsigned long *ul = dyn_pageflags_ul_ptr(bitmap, page);
> > > +	int zone_offset = page_to_zone_offset(page);
> > > +	int bit = PAGEBIT(zone_offset);
> > > +	set_bit(bit, ul);
> > > +}
> > > +
> > > +/*
> > > + * clear_dynpageflags(dyn_pageflags_t *bitmap, struct page *page)
> > > + *
> > > + * Clear the flag for the page in the given bitmap.
> > > + *
> > > + */
> > > +
> > > +void clear_dynpageflag(dyn_pageflags_t *bitmap, struct page *page)
> > > +{
> > > +	unsigned long *ul = dyn_pageflags_ul_ptr(bitmap, page);
> > > +	int zone_offset = page_to_zone_offset(page);
> > > +	int bit = PAGEBIT(zone_offset);
> > > +	clear_bit(bit, ul);
> > > +}
> > > +
> > > +/*
> > > + * get_next_bit_on(dyn_pageflags_t bitmap, int counter)
> > > + *
> > > + * Given a pfn (possibly -1), find the next pfn in the bitmap that
> > > + * is set. If there are no more flags set, return -1.
> > > + *
> > > + */
> > > +
> > > +int get_next_bit_on(dyn_pageflags_t bitmap, int counter)
> > > +{
> > > +	struct page *page;
> > > +	struct zone *zone;
> > > +	unsigned long *ul = NULL;
> > > +	int zone_offset, pagebit, zone_num, first;
> > > +
> > > +	first = (counter == -1);
> > > +
> > > +	if (first)
> > > +		counter = first_online_pgdat()->node_zones->zone_start_pfn;
> > > +
> > > +	page = pfn_to_page(counter);
> > > +	zone = page_zone(page);
> > > +	zone_num = page_zone_number(page);
> > > +	zone_offset = counter - zone->zone_start_pfn;
> > > +
> > > +	if (first) {
> > > +		counter--;
> > > +		zone_offset--;
> > > +	} else
> > > +		ul = (bitmap[zone_num][PAGENUMBER(zone_offset)]) +
> > > PAGEINDEX(zone_offset); +
> > > +	do {
> > > +		counter++;
> > > +		zone_offset++;
> > > +
> > > +		if (zone_offset >= zone->spanned_pages) {
> > > +			do {
> > > +				zone = next_zone(zone);
> > > +				if (!zone)
> > > +					return -1;
> > > +				zone_num++;
> > > +			} while(!zone->spanned_pages);
> > > +
> > > +			counter = zone->zone_start_pfn;
> > > +			zone_offset = 0;
> > > +		}
> > > +
> > > +		/*
> > > +		 * This could be optimised, but there are more
> > > +		 * important things and the code is simple at
> > > +		 * the moment
> > > +		 */
> >
> > Please let me disagree with this comment.  I think it would be more
> > efficient if you searched the bitmap for the first zero bit and then find
> > the page, pfn, whatever corresponding to this bit.
> 
> Yes, it would be more efficient. I don't mind modifying it. This is a 
> different moment to the one when I wrote that :)
> 
> > > +		pagebit = PAGEBIT(zone_offset);
> > > +
> > > +		if (!pagebit)
> > > +			ul = (bitmap[zone_num][PAGENUMBER(zone_offset)]) +
> > > PAGEINDEX(zone_offset); +
> > > +	} while(!test_bit(pagebit, ul));
> > > +
> > > +	return counter;
> > > +}
> > > +
> 
> Once again,
> 
> Thanks for your feedback. It's good to get picked up on cleanups I've missed 
> and be made to remember again and provide the justification for decisions 
> I've made.

Well, unfortunately the patch doesn't look very good to me.  The overall idea
is quite reasonable, but IMHO the implementation needs fixing.

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux