Ingo Molnar wrote:
* Jie Chen <[email protected]> wrote:
the moment you saturate the system a bit more, the numbers should
improve even with such a ping-pong test.
You are right. If I manually do load balance (bind unrelated processes
on the other cores), my test code perform as well as it did in the
kernel 2.6.21.
so right now the results dont seem to be too bad to me - the higher
overhead comes from two threads running on two different cores and
incurring the overhead of cross-core communications. In a true
spread-out workloads that synchronize occasionally you'd get the same
kind of overhead so in fact this behavior is more informative of the
real overhead i guess. In 2.6.21 the two threads would stick on the same
core and produce artificially low latency - which would only be true in
a real spread-out workload if all tasks ran on the same core. (which is
hardly the thing you want on openmp)
I use pthread_setaffinity_np call to bind one thread to one core. Unless
the kernel 2.6.21 does not honor the affinity, I do not see the
difference running two threads on two cores between the new kernel and
the old kernel. My test code does not do any numerical calculation, but
it does spin waiting on shared/non-shared flags. The reason I am using
the affinity is to test synchronization overheads among different cores.
In either the new and the old kernel, I do see 200% CPU usage when I ran
my test code for two threads. Does this mean two threads are running on
two cores? Also I verify a thread is indeed bound to a core by using
pthread_getaffinity_np.
In any case, if i misinterpreted your numbers or if you just disagree,
or if have a workload/test that shows worse performance that it
could/should, let me know.
Ingo
Hi, Ingo:
Since I am using affinity flag to bind each thread to a different core,
the synchronization overhead should increases as the number of
cores/threads increases. But what we observed in the new kernel is the
opposite. The barrier overhead of two threads is 8.93 micro seconds vs
1.86 microseconds for 8 threads (the old kernel is 0.49 vs 1.86). This
will confuse most of people who study the synchronization/communication
scalability. I know my test code is not real-world computation which
usually use up all cores. I hope I have explained myself clearly. Thank
you very much.
--
###############################################
Jie Chen
Scientific Computing Group
Thomas Jefferson National Accelerator Facility
12000, Jefferson Ave.
Newport News, VA 23606
(757)269-5046 (office) (757)269-6248 (fax)
[email protected]
###############################################
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]