On Tue, Jul 31, 2007 at 01:36:14PM -0700, Dave Hansen wrote:
> Since the pagemap code has a little header on it to help describe the
> format, I wrote a little c program to parse its output. I get some
> strange results. If I do this:
>
> fd = open("/proc/1/pagemap", O_RDONLY);
> count = read(fd, &endianness, 1);
>
> count will always be 4.
Known bug, fixed in my pending and not-currently-working update. It
ought to return 0 for short reads.
> hexdump gets similar, but even worse results:
>
> qemu:~# strace hexdump -C /proc/self/pagemap
> ...
> read(0, "\1\f\4\4\377\377\377\377\377\377\377\377\377\377\377\377"..., 16) = 20
> read(0, 0x804d39c, 4294967292) = -1 EFAULT (Bad address)
> --- SIGSEGV (Segmentation fault) @ 0 (0) ---
> +++ killed by SIGSEGV +++
>
> Note that the kernel returns 20 to the read request of 16. I think the
> kernel is actually copying over something important in hexdump's memory
> which is adjacent to the buffer and causing it to segfault.
Also fixed.
> The code is basically organized not to output the right thing for any
> unaligned access, and it apparently gets confused about exactly what
> userspace has asked for. I think this is largely due to its overwriting
> of "count" in pagemap_read().
>
> So, a couple of questions. Don't we need to support non-sizeof(unsigned
> long)-aligned reads?
Why? We should obviously never return more data than we were asked for
(that's clearly a bug), but lots of things refuse to read or write
stuff that isn't well sized and aligned.
> Do we _really_ need that header in each and every file?
Well there's either a header or there isn't.
> > * first byte: 0 for big endian, 1 for little
>
> Do we ever have cases where userspace and kernel differ in their
> endianness? Or, are you hoping to dump these files raw on one
> architecture and parse them on another?
Potentially, yes.
> > * second byte: page shift (eg 12 for 4096 byte pages)
>
> This might actually (in theory) change on a per-process basis, so it
> makes sense. But, it seems more global to the process that just pagemap
> output. Would this always be the same as getpagesize()? Or, should it
> always map 1:1 with the amount of memory mapped by a kernel pte_t. I
> _think_ these can be slightly different because we have 64k PAGE_SIZE on
> ppc64, but allow mappings to happen in 4k
>
> > * third byte: entry size in bytes (currently either 4 or 8)
>
> This one really boils down to "what is the kernel's sizeof(unsigned
> long)" because we'll always store pfns in those. It seems like we
> should have a better way to go fetch that.
>
> > * fourth byte: header size
>
> If we can get rid of the other three this, of course, goes away.
True. But the variable-sized header lets us add other stuff later.
--
Mathematics is the supreme nostalgia of our time.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]