Re: perfmon2 merge news

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Yes it is for everybody. I've been rather questioning if the slow
ways (complicated syscalls) to get the counter information are really
needed.

I suppose by complicated here, your referring to the gather semantics of the pfm_read/write_pmds/pmcs calls. Many processors may have 100's of registers
(IA64, BG/P, SiCortex), some of which have different access times. So a
naive syscall of 'give me all the registers you've got' isn't going to cut it. However, any additional simplicity (performance) we can squeeze out of this particular primitive is a huge win as it sits in the critical path of the user
tools (unless one is sampling).

referring to the concept of eventsets. Having multiplexing is
important.

Why is it important?


Performance and noise. See the earlier message about our user-land implementation versus kernel mode implementations. Any any useful granularity, you begin to seriously affect the counts with noise as well as dilate the run-time. But let's punt on this one until after we get the basics in. It's a non-essential feature at this point.

- Custom sample formats would be considered not often used in our
community, largely
because the tools run on all HPC/Linux architectures. PAPI uses the
default sample
format which has been sufficient for our needs. However, the lack of
custom sample
formats preclude the dev of the specialized tools that access the
sampling
hardware as found on the IA64, PPC64, the Barcelona and the SiCortex
node chip.
pfmon exports this functionality quite well, and it does get used.

What do you mean with custom sample formats exactly?  What information
do you want in there? And why?

By custom here, I mean the ability to have the kernel take samples containing more than just the IP, the PID and a bitmask of which registers overflowed at this point. Myself and others have worked hard to get effective address sampling into the hardware (there are registers that contain EA's of misses as well as branch mispredict data on the PPC, IA64, Barcelona and SiCortex) that are handled through the use of a format that gathers up that information at interrupt time for deposit into the sample buffer. We are not wedded to Perfmon2's implementation of these formats, we are however, wedded to having this information collected at interrupt time as the data may change by the time you get back to user-mode. This hardware is not obscure any more, it's the norm, as we've learned at thus simple aggregate counters, even those with precise
interrupt abilities, are not sufficient to satisfy all of our needs.

e.g. PEBS and so on pretty much fix the in memory sample format in hardware, so they only way to get a custom format would be to use a separate buffer.

I can think of one reason why the kernel should add more information
in a separate buffer (log the instruction bytes so that it can
be disassembled and a address histogram be generated using the PEBS
register values), but it is a relatively obscure one and definitely
not a essential feature. Unfortunately it is also hard to implement completely
race-free.


This is kind of comment that makes the Linux/HPC folks 'somber'. What
isn't useful, is being dismissive of an entire community that moves a
heck of a lot of Linux DVD's.

Sorry, but these kind of non technical BS arguments will just make
you be ignored in mainline Linux lands. They might work if you pay
a lot of money to specific Linux companies (do you?), but here
on linux-kernel you have to convince with purely technical arguments.

I love it when kernel folks refer to their own revenue streams
(and yes, we do, ask your VP of sales) and the needs of a user community as
"BS non-technical arguments".

But let's get back to basics here. We can sort that out over a beer sometime.
At this point, let's try and agree on the minimum set of
functionality acceptable for a first round of patches.

- per-CPU (system-wide) and per-thread 64-bit virtualized counters
- dispatch of interrupt on overflow via a signal
- first (self) and third-party (attach) semantics
- extensible to new lines within an architecture without repatching
  (By this I mean that through the use of modules that contain PMU
description tables, i.e. patches don't have to be issued for every new rev
   of HW that Intel releases)

To be considered later:
- Sample buffers and formats
- Multiplexing (by event threshold and time slicing)
- fast-read support if the hardware supports it (mmap + user rdpmc)

I think(?) we are all clear now why oprofile is not sufficient, i.e.
simultaneous usage by non-root users, each with different counter configurations, lack of read/write access etc. Oprofile is however, a very important tool and any initial set of functionality should allow for a very simple port to
each version of the infrastructure along the way. I'd happily port
incremental versions of PAPI to the patches, so the performance tools can be
accessible to the LKML community while testing/benchmarking the
patchset on a variety of architectures. If we can agree on the starting point, we can move the discussion of the API to the Perfmon2 mailing list and with your
input, finally 'get it right' in terms of acceptance.

If there's anything we do have in common at the moment, it's momentum.
We (speaking for the HPC community/vendors again) are not in favor of useless bloat, every TLB slot, mispredict, miss, timer-tick or pipeline bubble, we care about, kernel space or not. It's precisely why this type of infrastructure has become so
vital to us over the years.

-Phil
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux