Re: [PATCH] mm: Implement Swap Prefetching v23

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Friday 10 February 2006 14:25, Andrew Morton wrote:
> Con Kolivas <[email protected]> wrote:
> > Here's a respin with Nick's suggestions and a modification to not cost us
> > extra slab on non-numa.
>
> v23?  I'm sure we can do better than that.

:D

> > This patch implements swap prefetching when the vm is relatively idle and
> > there is free ram available.
>
> I think "free ram available" is the critical thing here.  If it doesn't
> evict anyhing else then OK, it basically uses unutilised disk bandwidth for
> free.
>
> But where does it put the pages?  If it was really "free", they'd go onto
> the tail of the inactive list.

It puts them in swapcache. This seems to work nicely as a nowhere-land place 
where they don't have much affect on anything until we need them or need more 
ram. This has worked well, but I'm open to other suggestions.

> And what about all those unpaged text pages which the app will need to
> fault back in?

Tracking these would make the perceived time to wakeup much much faster but 
also make the list a heck of a lot larger.. but more to the point I have no 
idea how to do it and get what we really want on the list.

> > Once pages have been added to the swapped list, a timer is started,
> > testing for conditions suitable to prefetch swap pages every 5 seconds.
> > Suitable conditions are defined as lack of swapping out or in any pages,
> > and no watermark tests failing. Significant amounts of dirtied ram and
> > changes in free ram representing disk writes or reads also prevent
> > prefetching.
>
> OK.   The has-the-disk-been-used-recently test still isn't there?

No, but you'd have to free up +/- SWAP_SCAN_MAX as many pages as you fill when 
reading from disk for this indirect method to not pick it up. It's easy 
enough to see that it's effective: You can sit and watch vmstat for many 
minutes after say an oom-kill in the hope you see it quiet enough before 
prefetch does anything. After allowing 'tail -f /dev/zero' to be oomkilled in 
a full gui environment it usually takes >5 mins before swap prefetch starts 
prefetching.

> > @@ -844,6 +844,7 @@ int zone_watermark_ok(struct zone *z, in
> >  		if (free_pages <= min)
> >  			return 0;
> >  	}
> > +
> >  	return 1;
> >  }
>
> ?

Legacy of v1->v22..

>
> > +	read_lock(&swapper_space.tree_lock);
>
> That's interesting.  We conventionally do read_lock_irq() on an
> address_space.tree_lock.  Because
> end_page_writeback()->rotate_reclaimable_page()->test_clear_page_writeback(
>) needs to take a write_lock from interrupt context.  end_swap_bio_write()
> calls end_page_writeback() so I don't immediately see why we don't have a
> deadlock here.

Wasn't sure of the semantics. I was lucky I guess.

> Ordinarily we'd just use find_get_page(), but
>
> > +	/* Entry may already exist */
> > +	page = radix_tree_lookup(&swapper_space.page_tree, entry.val);
> > +	read_unlock(&swapper_space.tree_lock);
> > +	if (page) {
> > +		remove_from_swapped_list(entry.val);
> > +		goto out;
> > +	}
>
> you don't take a ref on the page.
>
> Which makes one wonder what happens here if `page' got whipped out of
> swapcache between the lookup and the remove_from_swapped_list().  Probably
> nothing much.
>
> Did you consider borrowing swapper_space.tree_lock to provide the list and
> radix-tree locking throughout here?

I did. I was concerned that since most of the functions occur as we swap in or 
out that we'd end up with more contention of that lock.

> > +out:
> > +	if (mru) {
> > +		spin_lock(&swapped.lock);
> > +		swapped.list.next = mru;
> > +		spin_unlock(&swapped.lock);
> > +	}
>
> That looks strange.  What happens to swapped.list.prev?

Left dangling for future null dereferencing... Only would have been hit on 
numa.

> > +			schedule_timeout_interruptible(MAX_SCHEDULE_TIMEOUT);
>
> 	set_current_state(TASK_INTERRUPTIBLE);
> 	schedule();

Ack.

Thanks! I guess I will be working on v24 in the near future...

Cheers,
Con
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux