On Thu, May 18, 2006 at 05:06:28PM +0200, BankHacker wrote: > I think this could be the reason for the slowdown in my case because > symptoms match. My program is invoking 10 million system calls for the > first random test and it happens only in the dynamic version. My CPU > is an Intel Pentium IV. You are not invoking 10 million system calls, just 10 million library routines. Now, rand() as well as random() are thread-safe routines, so they need to use a lock to protect the seed (unlike say random_r ()). On i?86, glibc inlines the lock code: 14: b9 01 00 00 00 mov $0x1,%ecx 19: 65 83 3d 0c 00 00 00 cmpl $0x0,%gs:0xc 20: 00 21: 74 01 je 24 <__random+0x24> 23: f0 0f b1 8b XX XX XX lock cmpxchg %ecx,0xXXXXXXXX(%ebx) 2a: XX 2b: 0f 85 8f 01 00 00 jne 1c0 <_L_mutex_lock_13> and similarly for unlock. %gs:0xc ought to be non-zero only when the first pthread_call in the program has been made. On PIV, atomic instructions are horribly expensive. Either you have preloaded some library that called pthread_create, or your CPU is unable to do the jump around lock prefix trick quickly. I certainly don't see your testcase being slow on my AMD64, neither for 64-bit nor 32-bit program. First of all, try benchmarking random_r, then see under debugger if %gs:0xc is 0 or not. Jakub