Re: [PATCH] mm: fix page_mkclean_one

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, 28 Dec 2006, Andrew Morton wrote:
> 
> It would be interesting to convert your app to do fsync() before
> FADV_DONTNEED.  That would take WB_SYNC_NONE out of the picture as well
> (apart from pdflush activity).

I get corruption - but the whole point is that it's very much pdflush that 
should be writing these pages out.

Andrew - give my test-program a try. It can run in about 1 minute if you 
have a 256MB machine (I didn't, but "mem=256M" is my friend), and it seems 
to very consistently cause corruption.

What I do is:

	# Make sure we write back aggressively
	echo 5 > /proc/sys/vm/dirty_ratio

as root, and then just run the thing. Tons of corruption. But the 
corruption goes away if I just leave the default dirty ratio alone (but 
then I can increse the file size to trigger it, of course - but that 
also makes the test run a lot slower).

Now, with a pre-2.6.19 kernel, I bet you won't get the corruption as 
easily (at least with the "fsync()"), but that's less to do with anything 
new, and probably just because then you simply won't have any pdflushing 
going on - since the kernel won't even notice that you have tons of dirty 
pages ;)

It might also depend on the speed of your disk drive - the machine I test 
this on has a slow 4200 rpm laptop drive in it, and that probably makes 
things go south more easily. That's _especially_ true if this is related 
to any "bdi_write_congested()" logic.

Now, it could also be related to various code snippets like

	...
	if (wbc->sync_mode != WB_SYNC_NONE)
		wait_on_page_writeback(page);

	if (PageWriteback(page) ||
			!clear_page_dirty_for_io(page)) {
		unlock_page(page);
		continue;
	}
	...

where the WB_SYNC_NONE case will hit the "PageWriteback()" and just not do 
the writeback at all (but it also won't clear the dirty bit, so it's 
certainly not an *OBVIOUS* bug).

We also have code like this ("pageout()"):

	if (clear_page_dirty_for_io(page)) {
		int res;
		struct writeback_control wbc = {
			.sync_mode = WB_SYNC_NONE,
			..
		}
		...
		res = mapping->a_ops->writepage(page, &wbc);

and in this case, if the "WB_SYNC_NONE" means that the "writepage()" call 
won't do anything at all because of congestion, then that would be a _bad_ 
thing, and would certainly explain how something didn't get written out.

But that particular path should only trigger for the "shrink_page_list()" 
case, and it's not the case I seem to be testing with my "low dirty_ratio" 
testing.

		Linus
#include <sys/mman.h>
#include <sys/fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <time.h>

#define TARGETSIZE (22 << 20)
#define CHUNKSIZE (1460)
#define NRCHUNKS (TARGETSIZE / CHUNKSIZE)
#define SIZE (NRCHUNKS * CHUNKSIZE)

static void fillmem(void *start, int nr)
{
	memset(start, nr, CHUNKSIZE);
}

#define page_offset(buf, off) (unsigned)((unsigned long)(buf)+(off)-(unsigned long)(mapping))

static int chunkorder[NRCHUNKS];
static char *mapping;

static int order(int nr)
{
	int i;
	if (nr < 0 || nr >= NRCHUNKS)
		return -1;
	for (i = 0; i < NRCHUNKS; i++)
		if (chunkorder[i] == nr)
			return i;
	return -2;
}

static void checkmem(void *buf, int nr)
{
	unsigned int start = ~0u, end = 0;
	unsigned char c = nr, *p = buf, differs = 0;
	int i;
	for (i = 0; i < CHUNKSIZE; i++) {
		unsigned char got = *p++;
		if (got != c) {
			if (i < start)
				start = i;
			if (i > end)
				end = i;
			differs = got;
		}
	}
	if (start < end) {
		printf("Chunk %d corrupted (%u-%u)  (%x-%x)            \n", nr, start, end,
			page_offset(buf, start), page_offset(buf, end));
		printf("Expected %u, got %u\n", c, differs);
		printf("Written as (%d)%d(%d)\n", order(nr-1), order(nr), order(nr+1));
	}
}

static char *remap(int fd, char *mapping)
{
	if (mapping) {
		munmap(mapping, SIZE);
//		fsync(fd);
		posix_fadvise(fd, 0, SIZE, POSIX_FADV_DONTNEED);
	}
	return mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
}

int main(int argc, char **argv)
{
	int fd, i;

	/*
	 * Make some random ordering of writing the chunks to the
	 * memory map..
	 *
	 * Start with fully ordered..
	 */
	for (i = 0; i < NRCHUNKS; i++)
		chunkorder[i] = i;

	/* ..and then mix it up randomly */
	srandom(time(NULL));
	for (i = 0; i < NRCHUNKS; i++) {
		int index = (unsigned int) random() % NRCHUNKS;
		int nr = chunkorder[index];
		chunkorder[index] = chunkorder[i];
		chunkorder[i] = nr;
	}

	fd = open("mapfile", O_RDWR | O_TRUNC | O_CREAT, 0666);
	if (fd < 0)
		return -1;
	if (ftruncate(fd, SIZE) < 0)
		return -1;
	mapping = remap(fd, NULL);
	if (-1 == (int)(long)mapping)
		return -1;

	for (i = 0; i < NRCHUNKS; i++) {
		int chunk = chunkorder[i];
		printf("Writing chunk %d/%d (%d%%)     \r", i, NRCHUNKS, 100*i/NRCHUNKS);
		fillmem(mapping + chunk * CHUNKSIZE, chunk);
	}
	printf("\n");

	/* Unmap, drop, and remap.. */
	mapping = remap(fd, mapping);

	/* .. and check */
	for (i = 0; i < NRCHUNKS; i++) {
		int chunk = i;
		printf("Checking chunk %d/%d (%d%%)     \r", i, NRCHUNKS, 100*i/NRCHUNKS);
		checkmem(mapping + chunk * CHUNKSIZE, chunk);
	}
	printf("\n");

	/* Clean up for next time */
	sleep(5);
	sync();
	sleep(5);
	munmap(mapping, SIZE);
	close(fd);
	unlink("mapfile");
	
	return 0;
}

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux