Can that really be the only changes? Should the dynamic version not be touching its dynamic libs early in the dump and the static version not?
Yes, you are right: Main differences between dynamic and static dump are these extra lines in the beginning of the dynamic dump: mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7faf000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=31426, ...}) = 0 mmap2(NULL, 31426, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7fa7000 close(3) = 0 open("/lib/libm.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0`\263r\000"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=199700, ...}) = 0 mmap2(0x728000, 147584, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x728000 mmap2(0x74b000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x22) = 0x74b000 close(3) = 0 open("/lib/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0J(`\000"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1532536, ...}) = 0 mmap2(0x5ed000, 1254780, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x5ed000 mmap2(0x71a000, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x12d) = 0x71a000 mmap2(0x71d000, 9596, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x71d000 close(3) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa6000 set_thread_area({entry_number:-1 -> 6, base_addr:0xb7fa66c0, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_n ot_present:0, useable:1}) = 0 mprotect(0x71a000, 8192, PROT_READ) = 0 mprotect(0x74b000, 4096, PROT_READ) = 0 mprotect(0x5e9000, 4096, PROT_READ) = 0 munmap(0xb7fa7000, 31426) = 0 Looking for information about "linux-gate.so.1" I have found this webpage: http://www.trilithium.com/johan/2005/08/linux-gate/ that says exactly that: "It turns out, though, that system calls invoked via interrupts are remarkably slow on the more recent members of the x86 processor family. An int 0x80 system call can be as much as an order of magnitude slower on a 2 GHz Pentium 4 than on an 850 MHz Pentium III. The impact on performance resulting from this could easily be significant, at least for applications that do a lot of system calls. Intel recognized this problem early on and introduced a more efficient system call interface in the form of sysenter and sysexit instructions. This fast system call feature first appeared in the Pentium Pro processor, but due to hardware bugs it's actually broken in most of the early CPUs. That's why you may see claims that sysenter was introduced with Pentium II or even Pentium III." I think this could be the reason for the slowdown in my case because symptoms match. My program is invoking 10 million system calls for the first random test and it happens only in the dynamic version. My CPU is an Intel Pentium IV. My glibc version is 2.4-8: # rpm -qa | grep glibc glibc-kernheaders-3.0-5.2 glibc-common-2.4-8 glibc-headers-2.4-8 glibc-2.4-8 glibc-devel-2.4-8 Do you think this could be the cause. How to fix it? Thanks!