Andi,
On Wed, Aug 23, 2006 at 12:29:06PM +0200, Andi Kleen wrote:
> Stephane Eranian <[email protected]> writes:
>
> > Because accessing PMU registers is usually much more expensive
> > than accessing general registers, we take great care at minimizing
> > the number of register accesses using various lazy save/restore schemes
> > for both UP and SMP kernels.
>
> Can you perhaps add a big "strategy" comment somewhere about
> how those lazy schemes work?
>
Will do.
> I suppose some of those functions must be marked __kprobes
>
Are there any guidelines as to why some functions must be ignored
by kprobes? I assume if meaans they cannot be instrumented.
> > +/*
> > + * interrupts are masked
> > + */
> > +static void __pfm_ctxswin_thread(struct task_struct *task,
> > + struct pfm_context *ctx)
> > +{
> > + u64 cur_act, now;
> > + struct pfm_event_set *set;
> > + int reload_pmcs, reload_pmds;
> > +
> > + now = pfm_arch_get_itc();
>
> Isn't this sched_clock()?
>
Yes, I could use that one too. I will make the switch.
> > +
> > + BUG_ON(!task->pid);
> > +
> > + spin_lock(&ctx->lock);
>
> Why does it have an own lock? Shouldn't the caller protect it already.
> It must be because you don't prevent preemption for once.
>
> The locking in general needs a big comment somewhere I think.
>
This is an interesting question. The lock protects the context as a whole.
Keep in mind that a context is identified by a file descriptor. Any thread with
access to the file description can issue commands on the context.
When a monitored thread is context switching, another thread with the file
descriptor running on another CPU could potentially access the context.
I don't think fget() does enough locking to protect simultaneous accesses,
it simply protects from the file struct disappearing using reference count.
>
> > +/*
> > + * come here when either prev or next has TIF_PERFMON flag set
> > + * Note that this is not because a task has TIF_PERFMON set that
> > + * it has a context attached, e.g., in system-wide on certain arch.
> > + */
> > +void __pfm_ctxsw(struct task_struct *prev, struct task_struct *next)
> > +{
> > + struct pfm_context *ctxp, *ctxn;
> > + u64 now;
> > +
> > + now = pfm_arch_get_itc();
>
> sched_clock(). And it can be expensive and you seem to do it redundandtly.
> I would one do it once and pass down.
>
Done.
>
> > + * given that prev and next can never be the same, this
> > + * test is checking that ctxp == ctxn == NULL which is
> > + * an indication we have an active system-wide session on
> > + * this CPU
> > + */
> > + if (ctxp == ctxn)
> > + __pfm_ctxsw_sys(prev, next);
> > +
> > + __get_cpu_var(pfm_stats).pfm_ctxsw_count++;
> > + __get_cpu_var(pfm_stats).pfm_ctxsw_cycles += pfm_arch_get_itc() - now;
>
> Is this really needed? On p4 you added hundreds of cycles now.
This is mostly for debugging. It will eventually go away.
--
-Stephane
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]