Milton,
Thank you for your comments. Carl will reply to certain parts of your
posting where he's more knowledgeable than I. See my replies below.
-Maynard
Milton Miller wrote:
On Feb 6, 2007, at 5:02 PM, Carl Love wrote:
This is the first update to the patch previously posted by Maynard
Johnson as "PATCH 4/4. Add support to OProfile for profiling CELL".
[snip]
Data collected
The current patch starts tackling these translation issues for the
presently common case of a static self contained binary from a single
file, either single separate source file or embedded in the data of
the host application. When creating the trace entry for a SPU
context switch, it records the application owner, pid, tid, and
dcookie of the main executable. It addition, it looks up the
object-id as a virtual address and records the offset if it is non-zero,
or the dcookie of the object if it is zero. The code then creates
a data structure by reading the elf headers from the user process
(at the address given by the object-id) and building a list of
SPU address to elf object offsets, as specified by the ELF loader
headers. In addition to the elf loader section, it processes the
overlay headers and records the address, size, and magic number of
the overlay.
When the hardware trace entries are processed, each address is
looked up this structure and translated to the elf offset. If
it is an overlay region, the overlay identify word is read and
the list is searched for the matching overlay. The resulting
offset is sent to the oprofile system.
The current patch specifically identifies that only single
elf objects are handled. There is no code to handle dynamic
linked libraries or overlays. Nor is there any method to
Yes, we do handle overlays. (Note: I'm looking into a bug right now in
our overlay support.)
present samples that may have been collected during context
switch processing, they must be discarded.
My proposal is to change what is presented to user space. Instead
of trying to translate the SPU address to the backing file
as the samples are recorded, store the samples as the SPU
context and address. The context switch would record tid,
pid, object id as it does now. In addition, if this is a
new object-id, the kernel would read elf headers as it does
today. However, it would then proceed to provide accurate
dcookie information for each loader region and overlay. To
identify which overlays are active, (instead of the present
read on use and search the list to translate approach) the
kernel would record the location of the overlay identifiers
as it parsed the kernel, but would then read the identification
word and would record the present value as an sample from
a separate but related stream. The kernel could maintain
the last value for each overlay and only send profile events
for the deltas.
Discussions on this topic in the past have resulted in the current
implementation precisely because we're able to record the samples as
fileoffsets, just as the userspace tools expect. I haven't had time to
check out how much this would impact the userspace tools, but my gut
feel is that it would be quite significant. If we were developing this
module with a matching newly-created userspace tool, I would be more
inclined to agree that this makes sense. But you give no rationale for
your proposal that justifies the change. The current implementation
works, it has no impact on normal, non-profiling behavior, and the
overhead during profiling is not noticeable.
This approach trades translation lookup overhead for each
recorded sample for a burst of data on new context activation.
In addition it exposes the sample point of the overlay identifier
vs the address collection. This allows the ambiguity to be
exposed to user space. In addition, with the above proposed
kernel timer vs sample collection, user space could limit the
elapsed time between the address collection and the overlay
id check.
Yes, there is a window here where an overlay could occur before we
finish processing a group of samples that were actually taken from a
different overlay. The obvious way to prevent that is for the kernel
(or SPUFS) to be notified of the overlay and let OProfile know that we
need to drain (perhaps discard would be best) our sample trace buffer.
As you indicate above, your proposal faces the same issue, but would
just decrease the number of bogus samples. I contend that the relative
number of bogus samples will be quite low in either case. Ideally, we
should have a mechanism to eliminate them completely so as to avoid
confusion the user's part when they're looking at a report. Even a few
bogus samples in the wrong place can be troubling. Such a mechanism
will be a good future enhancement.
[snip]
milton
--
[email protected] Milton Miller
Speaking for myself only.
_______________________________________________
Linuxppc-dev mailing list
[email protected]
https://ozlabs.org/mailman/listinfo/linuxppc-dev
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]