Re: /proc/$pid/pagemap troubles

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 31, 2007 at 03:43:10PM -0700, Dave Hansen wrote:
> On Tue, 2007-07-31 at 16:37 -0500, Matt Mackall wrote:
> > 
> > > So, a couple of questions.  Don't we need to support
> > non-sizeof(unsigned
> > > long)-aligned reads?
> > 
> > Why? We should obviously never return more data than we were asked for
> > (that's clearly a bug), but lots of things refuse to read or write
> > stuff that isn't well sized and aligned. 
> 
> For the 1-byte stuff at the beginning of the file, I think it makes
> perfect sense:
> 
> 	char __big_endian;
> 	int big_endian;
> 	read(fd, &__big_endian, 1);
> 	big_endian = __big_endian;

True.
 
> > > Do we _really_ need that header in each and every file?
> > 
> > Well there's either a header or there isn't.
> 
> I just mean that perhaps we should be putting some of that stuff in a
> global file.  I think there's some added simplicity both in the kernel
> and in the userspace programs if we just let "*ppos << PAGE_SHIFT ==
> vaddr".

Absolutely. It's the source of both of the long read bugs you
mentioned. But there are also advantages to having it attached.

> I was going to send you a patch, but since you're updating anyway, could
> you add some symbolic names like this:
> 
> #define PAGEMAP_ENTRY_SIZE_BYTES        sizeof(unsigned long)
> #define PAGEMAP_HEADER_SIZE_BYTES       sizeof(unsigned long)
> #define PAGEMAP_MAX_VIRT_PFN            (((~0UL) >> PAGE_SHIFT) + 1)
> #define PAGEMAP_BUFFER_SLOTS		(PAGE_SIZE/PAGEMAP_ENTRY_SIZE_BYTES)
> #define PAGEMAP_UNPOPULATED_PAGE	(-1UL)

I suppose.

> +       evpfn = min((src + count) / sizeof(unsigned long),
> +                   ((~0UL) >> PAGE_SHIFT) + 1);
> 
> Should that hunk of code be any different for 32-bit processes on a
> 64-bit kernel?  Do we want to use TASK_SIZE_OF() or restrict these to
> actual virtual address space (which is only 48 bits on x86_64).

No. A page for a 32-bit task can line anywhere in the >32-bit physical
address space.

> It also seems that we need some replacement for the CONFIG_HIGHPTE
> #ifdefs.  What is the basic reason for those?  The copy_to_user() called
> by add_to_pagemap()->flush_pagemap() conflicts with the kmap_atomic()
> from pte_offset_map()?

My not-yet-working code uses get_user_pages and does away with all the
double-buffering and a bunch of the locking issues. 

> As I think about it, do we really even need to walk the VMA list at all?
> We don't do anything differently if an address is inside a VMA.  We just
> save time by not walking the pagetables for areas that we know are
> zeroed.  But, we won't be able to walk down to the PTE level in most
> cases, anyway.  So, it probably won't waste _that_ much time.

Perhaps not.

> +struct pagemapread {
> +       struct mm_struct *mm;
> +       unsigned long next;
> +       unsigned long *buf;
> +       pte_t *ptebuf;
> +       unsigned long pos;
> +       size_t count;
> +       int index;
> +       char __user *out;
> +};
> 
> Just looking at the structure, it's _really_ hard to tell what fields go
> with other fields, and it took me a while to unravel it all.  There's
> also some redundancy in it, which I find a bit confusing.

Most of those go away with the get_user_pages approach.

-- 
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux