You are not invoking 10 million system calls, just 10 million library routines.
Ok, that means rand() function do not make a system call each time invoked, isn´t it?
Now, rand() as well as random() are thread-safe routines, so they need to use a lock to protect the seed (unlike say random_r ()). On i?86, glibc inlines the lock code: 14: b9 01 00 00 00 mov $0x1,%ecx 19: 65 83 3d 0c 00 00 00 cmpl $0x0,%gs:0xc 20: 00 21: 74 01 je 24 <__random+0x24> 23: f0 0f b1 8b XX XX XX lock cmpxchg %ecx,0xXXXXXXXX(%ebx) 2a: XX 2b: 0f 85 8f 01 00 00 jne 1c0 <_L_mutex_lock_13> and similarly for unlock. %gs:0xc ought to be non-zero only when the first pthread_call in the program has been made. On PIV, atomic instructions are horribly expensive. Either you have preloaded some library that called pthread_create, or your CPU is unable to do the jump around lock prefix trick quickly.
This is out of my scope (assembler, threads, atomic instructions ...) but I try to follow your explanation.
I certainly don't see your testcase being slow on my AMD64, neither for 64-bit nor 32-bit program.
Ok. The same for many other systems. Mine seems to be an odd exception.
First of all, try benchmarking random_r, then see under debugger if %gs:0xc is 0 or not.
Jakub, I would like to do it but I don´t know how. I have tried to implement a code to test random_r() function unsuccesfully: start = clock(); for(i=0; i<numero_ciclos; i++) { r = random_r(); } end = clock(); printf("%d M de random_r() en %.3f sec (example.: %d)\n", numero_ciclosM, (double)(end - start)/CLOCKS_PER_SEC, r); Can´t compile it. Gcc says me "too few arguments for function random_r()". I don´t know which variable to put inside random_r. Can you help me? Thanks!