Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch

Milton,

Thank you for your comments. Carl will reply to certain parts of yourposting where he's more knowledgeable than I. See my replies below.


-Maynard

Milton Miller wrote:

On Feb 6, 2007, at 5:02 PM, Carl Love wrote:

This is the first update to the patch previously posted by Maynard
Johnson as "PATCH 4/4. Add support to OProfile for profiling CELL".

[snip]


Data collected


The current patch starts tackling these translation issues for the
presently common case of a static self contained binary from a single
file, either single separate source file or embedded in the data of
the host application.   When creating the trace entry for a SPU
context switch, it records the application owner, pid, tid, and
dcookie of the main executable.   It addition, it looks up the
object-id as a virtual address and records the offset if it is non-zero,
or the dcookie of the object if it is zero.   The code then creates
a data structure by reading the elf headers from the user process
(at the address given by the object-id) and building a list of
SPU address to elf object offsets, as specified by the ELF loader
headers.   In addition to the elf loader section, it processes the
overlay headers and records the address, size, and magic number of
the overlay.

When the hardware trace entries are processed, each address is
looked up this structure and translated to the elf offset.  If
it is an overlay region, the overlay identify word is read and
the list is searched for the matching overlay.  The resulting
offset is sent to the oprofile system.

The current patch specifically identifies that only single
elf objects are handled.  There is no code to handle dynamic
linked libraries or overlays.   Nor is there any method to

Yes, we do handle overlays. (Note: I'm looking into a bug right now inour overlay support.)

present samples that may have been collected during context
switch processing, they must be discarded.


My proposal is to change what is presented to user space.  Instead
of trying to translate the SPU address to the backing file
as the samples are recorded, store the samples as the SPU
context and address.  The context switch would record tid,
pid, object id as it does now.   In addition, if this is a
new object-id, the kernel would read elf headers as it does
today.  However, it would then proceed to provide accurate
dcookie information for each loader region and overlay.  To
identify which overlays are active, (instead of the present
read on use and search the list to translate approach) the
kernel would record the location of the overlay identifiers
as it parsed the kernel, but would then read the identification
word and would record the present value as an sample from
a separate but related stream.   The kernel could maintain
the last value for each overlay and only send profile events
for the deltas.

Discussions on this topic in the past have resulted in the currentimplementation precisely because we're able to record the samples asfileoffsets, just as the userspace tools expect. I haven't had time tocheck out how much this would impact the userspace tools, but my gutfeel is that it would be quite significant. If we were developing thismodule with a matching newly-created userspace tool, I would be moreinclined to agree that this makes sense. But you give no rationale foryour proposal that justifies the change. The current implementationworks, it has no impact on normal, non-profiling behavior, and theoverhead during profiling is not noticeable.

This approach trades translation lookup overhead for each
recorded sample for a burst of data on new context activation.
In addition it exposes the sample point of the overlay identifier
vs the address collection.  This allows the ambiguity to be
exposed to user space.   In addition, with the above proposed
kernel timer vs sample collection, user space could limit the
elapsed time between the address collection and the overlay
id check.

Yes, there is a window here where an overlay could occur before wefinish processing a group of samples that were actually taken from adifferent overlay. The obvious way to prevent that is for the kernel(or SPUFS) to be notified of the overlay and let OProfile know that weneed to drain (perhaps discard would be best) our sample trace buffer.As you indicate above, your proposal faces the same issue, but wouldjust decrease the number of bogus samples. I contend that the relativenumber of bogus samples will be quite low in either case. Ideally, weshould have a mechanism to eliminate them completely so as to avoidconfusion the user's part when they're looking at a report. Even a fewbogus samples in the wrong place can be troubling. Such a mechanismwill be a good future enhancement.


[snip]

milton
--
[email protected]   Milton Miller
Speaking for myself only.

_______________________________________________
Linuxppc-dev mailing list
[email protected]
https://ozlabs.org/mailman/listinfo/linuxppc-dev



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch
  - From: Milton Miller <[email protected]>

References:
- [RFC,PATCH] CELL PPU Oprofile SPU profiling updated patch
  - From: Carl Love <[email protected]>
- Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch
  - From: Carl Love <[email protected]>
- Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch
  - From: Milton Miller <[email protected]>

Prev by Date: Re: -mm merge plans for 2.6.21
Next by Date: Re: Linux header change breaks linux-atm userspace build
Previous by thread: Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch
Next by thread: Re: [Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]