Re: The emperor is naked: why *comprehensive* static markup belongs in mainline

Karim Yaghmour wrote:

Why, in fact, that's exactly Jose's point of view. Who's
Jose? Well, just in case you weren't aware of his work,
Jose maintains LKET. What's LKET? An ltt-equivalent

Small correction. Li GuangLei maintains LKET, I mostly oversee itsdevelopment and provide guidance to him and his team (and on occasions,I like to cause trouble in mailing lists).

that uses SystemTap to get its events. And what does
Jose say? Well I couldn't say it better than him:
> I agree with you here, I think is silly to claim dynamic instrumentation> as a fix for the "constant maintainace overhead" of static trace point.> Working on LKET, one of the biggest burdens that we've had is mantainig> the probe points when something in the kernel changes enough to cause a> breakage of the dynamic instrumentation. The solution to this is having> the SystemTap tapsets maintained by the subsystems maintainers so that> changes in the code can be applied to the dynamic instrumentation as> well. This of course means that the subsystem maintainer would need to> maintain two pieces of code instead of one. There are a lot of> advantages to dynamic vs static instrumentation, but I don't think> maintainace overhead is one of them.
Well, well, well. Here's a guy doing *exactly* what I was
asked to do a couple of years back. And what does he say?
"I think is silly to claim dynamic instrumentation as a
fix for the "constant maintainace overhead" of static trace
point."

My point here was that someone still needs to maintain the tracepointsregardless of where they are located. While I think that the challengesof maintaining the tracepoints in kernel are less that maintaining themout of kernel (either through dynamic or static tracepoints), themaintenance overhead is still not zero for the subsystem maintainers.

And just in case you missed it the first time in his
paragraph, he repeats it *again* at the end:
" There are a lot of advantages to dynamic vs static
instrumentation, but I don't think maintainace overhead is
one of them."

One thing I would like to add though, is that base on my experienceusing event tracing tools, I say that the benefits of dynamicinstrumentation far outweigh its drawbacks.

But not content with Jose and Frank's first-hand experience
and testimonials about the cost of outside maintenance of
dynamically-inserted tracepoint, and obviously outright
dismissing the feedback from such heretics as Roman, Martin,
Mathieu, Tim, Karim and others, we have a continued barrage of
criticism from, shall we say, very orthodox kernel developers
who insist that the collective experience of the previously
mentioned people is simply misguided and that, as experienced
kernel developers, *they* know better.

I think the problem here is that we haven't done a good job in educatingdevelopers as to the value of event tracing the kernel has fordevelopers as well as sysadmins. For example, Frank has said to me inthe past that he does not see the value in just printing raw data out touser-space the way LKET does. While him and the SystemTap folks havenot done anything specifically to block the inclusion of LKET into theCVS tree, Frank lack of vision of what I want to achieve with this toolis partly a failure on my part.

Why the emperor is naked:
-------------------------

Truth be told:

There is no justification why Mathieu should continue
chasing kernels to allow his users utilize ltt on as
many kernel versions as possible.

There is no justification why the SystemTap team should
continue chasing kernels to make sure users can use
SystemTap on as many kernel versions as possible.

There is no justification why Jose should continue
chasing kernels to allow his users to use LKET on as
many kernel versions as possible.

There is, in fact, no justification why Jose, Frank,
and Mathieu aren't working on the same project.

In all honesty, I think it is time to kill LTTng, LKET and LKST and usethe experience gathered for these projects to create a new tool thatexploits all of the advantages of the previous tools. The attitude Igathered from the OLS tracing bof was that while there was interest inmaking tool A work with tool B, there was absolutely no interest insaying "fuck tools A and B and lets create tool C". I've alwaysadvocated towards this goal. I will be the first one to say "fuck mytool, lets work on tool C". It is now up to Mathiue and Hiramatsu-santo do the same. In my view, egos instead of technical issues are thething that are slowing the adoption of a event tracer in Linux"

There is no justification to any of this but the continued
*FEAR* by kernel developers that somehow their maintenance
workload is going to become unmanageable should anybody
get his way of adding static instrumentation into the
kernel. And no matter what personal *and* financial cost
this fear has had on various development teams, actual
*experience* from even those who have applied the most
outrageous of kernel developers requirements is but
grudgingly and conditionally recognized. No value, of
course, being placed on the experience of those that
*didn't* follow the orthodox diktat -- say by pointing
out that ltt tracepoints did not vary on a 5 year timespan.

The fact that tracepoint did not vary in a 5 year timespan just provesthat the users of LTTng are very few. The truth is that there is no wayto have a trace tool that will have all the tracepoints needed todiagnose every problem. If a static instrumentation mechanism where tobe included into the kernel, every user that had a useful statictracepoint for their environment would want to push it into the kernelin order to have their tracepoint available in distribution X and avoidhaving to patch and recompile a kernel. This seems like the fear thathas been discussed on the thread and I think its well justified. I knowI would like to push my tracepoints if the tool was available inmainline kernels.

One of the things that we tried to do with LKET was not predict what theuser would use the tool for. For this reason, the trace format and theconversion tool was design to be very dynamic.

For the argument, as it is at this stage of the long
intertwined thread of this week, is that "dynamic tracing"
is superior to "static tracing" because, amongst other
things, "static tracing" requires more instrumentation
than "dynamic tracing". But that, as I said within said
thread, is a fallacy. The statement that "static tracing"
requires more instrumentation than "dynamic tracing" is
only true in as far as you ignore that there is a cost
for out-of-tree maintenance of scripts for use by probe
mechanisms. And as you've read earlier, those doing this
stuff tell us there *is* cost to this. Not only do they
say that, but they go as far as telling us that this
cost is *no different* than that involved in maintaining
static trace points. That, in itself, flies in the face
of all accepted orthodox principles on the topic of
mainlined static tracing.

Improving out-of-tree maintenance of scripts is something that needs toimprove. Especially when you need to insert probes in the middle of afunction.

And that is but the maintenance aspect, I won't even
start on the performance issue. Because the current party
line is that while the kprobes mechanism is slow: a) it's
fast enough for all applicable uses, b) there's this
great new mechanism we're working on called djprobes which
eliminates all of kprobes' performance limitations. Of
course you are asked to pay no attention to the man behind
the curtain: a) if there is justification to work on
djprobes, it's because kprobes is dog-slow, which even
those using it for systemtap readily acknowledge, b)
djprobes has been more or less "on its way" for a year or
two now, and that's for one single architecture.

I think that the performance issues should be better understood. Rightnow, the thing that cause most of the slowdowns in LKET is not kprobesbut rather exporting the data. Gui Jian has done some measurementsusing benchmarks that do real work and found that over head in mostcases is significantly less than 10%.

A better performance testing methodology needs to be defined in order tojustifying your argument that kprobes are not suitable for the purposesof event tracing. Something more elaborated than a simple "ping -flocalhost" would be useful.

Another thing that needs to be considered is how much of an over head isacceptable in order for the tool to be useful. I will argue that inmost cases, the overhead of kprobes will not inhibit the ability forfind problems.



-JRS
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: The emperor is naked: why *comprehensive* static markup belongs in mainline
  - From: Karim Yaghmour <[email protected]>

Prev by Date: Re: tracepoint maintainance models
Next by Date: Re: Network performance degradation from 2.6.11.12 to 2.6.16.20
Previous by thread: Re: The emperor is naked: why *comprehensive* static markup belongs in mainline
Next by thread: Re: The emperor is naked: why *comprehensive* static markup belongs in mainline
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]