Re: Memory Allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Brian D. McGrew a écrit :
Good evening gents!

I need some help in allocating memory and understanding how the system
allocates memory with physical versus virtual page tables.  Please
consider the following snippet of code.  Please, no wisecracks about bad
code; it was written in 30 seconds in haste :-)

#include <iostream>

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

const static u_long kMaxSize = (2048 * 2048 * 256);

void *msg(void *ptr);
static u_long threads_done	= 0;

int
main(int argc, char *argv[])
{
     pthread_t thread1;
     pthread_t thread2;

     char *message1 = "Thread 1";
     char *message2 = "Thread 2";

     int iret1;
     int iret2;

     iret1 = pthread_create(&thread1, NULL, msg, (void *) message1);
     iret2 = pthread_create(&thread2, NULL, msg, (void *) message2);

    //pthread_join(thread1, NULL);
//pthread_join(thread2, NULL);
    while (threads_done < 2) {
	std::cout << "Threads complete: " << threads_done << std::endl;
	sleep(3);
    }

    exit(0);
}

void *
msg(void *ptr)
{
    char *message = (char *) ptr;

    //
    // Equal to 1 bank per thread of 256 each 4MP image buffers.  2GB.
    //
    char *buffer = new char[kMaxSize];

    u_long max = kMaxSize;

    //
    // Init each buffer to 'something'.
    //
    for (u_long inx = 0; inx < max; inx++) {
	if (inx % 102400000 == 0) {
	    std::cout << message << ": Index: " << inx << std::endl;
	}

    	buffer[inx] = inx;
    }

    free(buffer);
    threads_done++;
}

My test machine is a Dell Precision 490 with dual 5140 processors and
3GB of RAM.  If I reduced kMaxSize to (2048 * 2048 * 236) is works.
However, I need to allocate an array of char that is (2048 * 2048 * 256)
and maybe even as large at (2048 * 2048 * 512).

Obviously I have enough physical memory in the box to do this.  However,
I suspect that I'm running out of page table entries.  Please, correct
me if I'm wrong; but if I allocate (2048 * 2048 * 236) it work.  When I
increment to 256 or 512 it fails and it is my suspicion that I just
don't have enough more in kernel memory to allocate this much memory in
user space.
Because of a piece of 3rd party hardware, I'm forced to run the kernel
in the 4GB memory model.  What I need to be able to do is allocate an
array of char (2048 * 2048 * (up to 512)) in user space *** AND *** I
need the addresses that I get back to be contiguous, that's just the way
my 3rd party hardware works.

I'm inclined to believe that this in not specifically a Linux problem
but maybe an architecture problem???  But maybe there is some kind of
work around in the kernel for it???  I'd find it hard to believe that
I'm the first one that ever needed to use this much memory.

I ran this same code on two difference Macs.  One of them a Powerbook G4
with 4GB of RAM and it was successful.  The other was a Macbook Pro with
4GB of RAM and it failed.  Both running OS 10.4.9.  And of course it
runs just lovely on my Sun workstation with Solaris.  Thus, I'm thinking
it's an Intel/X86 issue!

How the heck to I get past this problem in Linux on the X86 plateform???

Thanks,

Hi Brian

Add this line at the begining of your msg() function :

char cmd[128];
sprintf(cmd, "cat /proc/%d/maps", getpid());
system(cmd);

You'll see :

08048000-08049000 r-xp 00000000 08:07 23         /tmp/test1
08049000-0804a000 rw-p 00000000 08:07 23         /tmp/test1
0804a000-0806b000 rw-p 0804a000 00:00 0
40000000-40015000 r-xp 00000000 08:02 31309      /lib/ld-2.3.6.so
40015000-40017000 rw-p 00014000 08:02 31309      /lib/ld-2.3.6.so
40017000-40019000 rw-p 40017000 00:00 0
4001d000-4002b000 r-xp 00000000 08:02 31349      /lib/tls/libpthread-2.3.6.so
4002b000-4002d000 rw-p 0000d000 08:02 31349      /lib/tls/libpthread-2.3.6.so
4002d000-4002f000 rw-p 4002d000 00:00 0
4002f000-40109000 r-xp 00000000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
40109000-4010c000 r--p 000d9000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
4010c000-4010e000 rw-p 000dc000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
4010e000-40114000 rw-p 4010e000 00:00 0
40114000-40137000 r-xp 00000000 08:02 31339      /lib/tls/libm-2.3.6.so
40137000-40139000 rw-p 00022000 08:02 31339      /lib/tls/libm-2.3.6.so
40139000-40143000 r-xp 00000000 08:02 31871      /lib/libgcc_s.so.1
40143000-40144000 rw-p 00009000 08:02 31871      /lib/libgcc_s.so.1
40144000-4026c000 r-xp 00000000 08:02 31335      /lib/tls/libc-2.3.6.so
4026c000-40271000 r--p 00127000 08:02 31335      /lib/tls/libc-2.3.6.so
40271000-40273000 rw-p 0012c000 08:02 31335      /lib/tls/libc-2.3.6.so
40273000-40278000 rw-p 40273000 00:00 0
40278000-40279000 ---p 40278000 00:00 0
40279000-40a78000 rw-p 40279000 00:00 0
bffff000-c0000000 rw-p bffff000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
Thread 1: Index: 0
08048000-08049000 r-xp 00000000 08:07 23         /tmp/test1
08049000-0804a000 rw-p 00000000 08:07 23         /tmp/test1
0804a000-0806b000 rw-p 0804a000 00:00 0
40000000-40015000 r-xp 00000000 08:02 31309      /lib/ld-2.3.6.so
40015000-40017000 rw-p 00014000 08:02 31309      /lib/ld-2.3.6.so
40017000-40019000 rw-p 40017000 00:00 0
4001d000-4002b000 r-xp 00000000 08:02 31349      /lib/tls/libpthread-2.3.6.so
4002b000-4002d000 rw-p 0000d000 08:02 31349      /lib/tls/libpthread-2.3.6.so
4002d000-4002f000 rw-p 4002d000 00:00 0
4002f000-40109000 r-xp 00000000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
40109000-4010c000 r--p 000d9000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
4010c000-4010e000 rw-p 000dc000 08:05 128152     /usr/lib/libstdc++.so.6.0.8
4010e000-40114000 rw-p 4010e000 00:00 0
40114000-40137000 r-xp 00000000 08:02 31339      /lib/tls/libm-2.3.6.so
40137000-40139000 rw-p 00022000 08:02 31339      /lib/tls/libm-2.3.6.so
40139000-40143000 r-xp 00000000 08:02 31871      /lib/libgcc_s.so.1
40143000-40144000 rw-p 00009000 08:02 31871      /lib/libgcc_s.so.1
40144000-4026c000 r-xp 00000000 08:02 31335      /lib/tls/libc-2.3.6.so
4026c000-40271000 r--p 00127000 08:02 31335      /lib/tls/libc-2.3.6.so
40271000-40273000 rw-p 0012c000 08:02 31335      /lib/tls/libc-2.3.6.so
40273000-40278000 rw-p 40273000 00:00 0
40278000-40279000 ---p 40278000 00:00 0
40279000-80a7a000 rw-p 40279000 00:00 0
80a7a000-80a7b000 ---p 80a7a000 00:00 0
80a7b000-8127a000 rw-p 80a7b000 00:00 0
bffff000-c0000000 rw-p bffff000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
terminate called after throwing an instance of 'std::bad_alloc'
  what():  St9bad_alloc
Aborted



The problem is about the dynamic libraries and thread stacks, that might be mapped in 0x40000000 zone. So your program cannot allocate a 2GB zone, because available zone for user program is 3GB, from 0x00000000 to 0xC0000000, but not contiguous.


Now if you compile your program with static libraries, it's a litle bit better :

g++ -o test1 -static test1.c -lpthread
# ./test1
Threads complete: 0
08048000-08137000 r-xp 00000000 08:07 23         /tmp/test1
08137000-08139000 rw-p 000ee000 08:07 23         /tmp/test1
08139000-081a4000 rw-p 08139000 00:00 0
40000000-40001000 ---p 40000000 00:00 0
40001000-40800000 rwxp 40001000 00:00 0
40800000-40801000 ---p 40800000 00:00 0
40801000-41000000 rwxp 40801000 00:00 0
41000000-41001000 rw-p 41000000 00:00 0
bffff000-c0000000 rw-p bffff000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
Thread 1: Index: 0
08048000-08137000 r-xp 00000000 08:07 23         /tmp/test1
08137000-08139000 rw-p 000ee000 08:07 23         /tmp/test1
08139000-081a4000 rw-p 08139000 00:00 0
40000000-40001000 ---p 40000000 00:00 0
40001000-40800000 rwxp 40001000 00:00 0
40800000-40801000 ---p 40800000 00:00 0
40801000-41000000 rwxp 40801000 00:00 0
41000000-81002000 rw-p 41000000 00:00 0
bffff000-c0000000 rw-p bffff000 00:00 0
ffffe000-fffff000 ---p 00000000 00:00 0
terminate called after throwing an instance of 'std::bad_alloc'
  what():  St9bad_alloc
Killed

Still some mappings (thread stacks) are bitting you.

If you want to use so much memory on a 32bit kernel, you might tune your program to :

- Avoid dynamic libraries
- allocate thread stacks yourself, so that they wont be in the midle of your address space (using malloc() zone, in the 08139000-08xxxxxx range)
...
- Use a smarter kernel that can map in the other way (from the top to the down) (check /proc/sys/vm/legacy_va_layout )

Of course, switching to a 64bit kernel just make this problem not existant :)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux