Re: Possible bug from kernel 2.6.22 and above, 2.6.24-rc4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ingo Molnar wrote:
* Jie Chen <[email protected]> wrote:

the moment you saturate the system a bit more, the numbers should improve even with such a ping-pong test.
You are right. If I manually do load balance (bind unrelated processes on the other cores), my test code perform as well as it did in the kernel 2.6.21.

so right now the results dont seem to be too bad to me - the higher overhead comes from two threads running on two different cores and incurring the overhead of cross-core communications. In a true spread-out workloads that synchronize occasionally you'd get the same kind of overhead so in fact this behavior is more informative of the real overhead i guess. In 2.6.21 the two threads would stick on the same core and produce artificially low latency - which would only be true in a real spread-out workload if all tasks ran on the same core. (which is hardly the thing you want on openmp)


I use pthread_setaffinity_np call to bind one thread to one core. Unless the kernel 2.6.21 does not honor the affinity, I do not see the difference running two threads on two cores between the new kernel and the old kernel. My test code does not do any numerical calculation, but it does spin waiting on shared/non-shared flags. The reason I am using the affinity is to test synchronization overheads among different cores. In either the new and the old kernel, I do see 200% CPU usage when I ran my test code for two threads. Does this mean two threads are running on two cores? Also I verify a thread is indeed bound to a core by using pthread_getaffinity_np.

In any case, if i misinterpreted your numbers or if you just disagree, or if have a workload/test that shows worse performance that it could/should, let me know.

	Ingo

Hi, Ingo:

Since I am using affinity flag to bind each thread to a different core, the synchronization overhead should increases as the number of cores/threads increases. But what we observed in the new kernel is the opposite. The barrier overhead of two threads is 8.93 micro seconds vs 1.86 microseconds for 8 threads (the old kernel is 0.49 vs 1.86). This will confuse most of people who study the synchronization/communication scalability. I know my test code is not real-world computation which usually use up all cores. I hope I have explained myself clearly. Thank you very much.

--
###############################################
Jie Chen
Scientific Computing Group
Thomas Jefferson National Accelerator Facility
12000, Jefferson Ave.
Newport News, VA 23606

(757)269-5046 (office) (757)269-6248 (fax)
[email protected]
###############################################

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux