On Mon, 2006-10-09 at 12:39 -0700, john stultz wrote:
> On Fri, 2006-10-06 at 11:54 -0700, Daniel Walker wrote:
> > plain text document attachment (clocksource_more_generic.patch)
> > Delete alot of remaining code in kernel/time/clocksource.c that
> > is replaced with this patch. Removed the deprecated "clock" kernel
> > parameter.
>
> Hmmm. This patch is a bit more confusing. From first glance it looks
> like a lot of code churn for not a whole lot of benefit. And as Thomas
> already mentioned, you should probably leave the "clock" bit alone for
> now.
The benefits is more in the separation of the two. I'll explain the
rational below.
> > Shifts some of the code around so that the time of day override
> > happens inside kernel/timer.c.
>
> Maybe could you explain your rational for this a bit more. I personally
> prefer the current breakdown, where the clocksource selection and
> override bits are in the clocksource code.
>
> Currently the layering is like this:
> [timekeeping core]
> [clocksource core]
> [clocksource drivers]
> Where timekeeping tries to have as little knowledge as possible of the
> details of the clocksource drivers (outside of basic knowledge of how to
> use them). To the timekeeping core, all clocksources are the same. It
> leaves the selection algorithm and its management up to the clocksource
> core.
>
> Now if I understand your intent you seem to be splitting it up a bit:
>
> [timekeeping core] [sched_clock code]
> [timekeeping clocksource selection] [sched_clock clocksource selection]
> [ clocksource core ]
> [ clocksource drivers ]
>
This is mostly how I see the layers, except after sched_clock I see
"..."
> Where the selection code is moved from the clocksource core into the
> timekeeping code. I can understand some of the rational, as timekeeping
> and sched_clock have differing selection criteria, so why not have
> separate logic and share the clocksource core.
>
> So, here's where I'm coming from on this issue: I feel sched_clock to be
> a unfortunately necessary hack. Ideally timekeeping reads should be fast
> enough to do from the scheduler, but that just is not the case (just on
> most arches, some don't have this problem). And since its just scheduler
> decisions, it does not need the correctness that timekeeping needs, so
> we have this arch specific hook that does whatever it needs. And since
> its sorta simple and stupid, the code duplication is pretty minor (no
> NTP adjustments, etc).
>
> In my mind this reduces the benefit gained from making a generic
> sched_clock. Currently the clocksource rating logic for timekeeping is
> pretty simple: "weigh correctness first, then consider performance".
>
> Now to have a generic sched_clock, we're going to have to introduce a
> new rating scale, which would select a more vague "speed over
> correctness, as long as its totally not insane" logic. To me, it seems a
> bit complicated for generic logic.
>
> Thus I prefer having the clocksource core keep the an understanding of
> the "correctness first, then performance".
>
> Now, I don't want to discourage your efforts here (BTW: I really
> appreciate them, and I think attempting new users for the clocksource
> infrastructure will make that infrastructure cleaner and better). It
> seems perfectly logical to use the clocksource infrastructure in
> sched_clock, but maybe, since sched_clock is the necessary hack that it
> is, we can do a more minor cleanup, with less impact to the clocksource
> infrastructure?
First, I've not done this clean up specifically for sched_clock. The
sched_clock changes are just there as an example of the interface
usage.
> Maybe an idea just to start, would be to have the arch specific
> sched_clock() code use __get_clock(char* name), with its internal
> selection logic based on the clocksource names. This will then have
> minor impact on the current timekeeping/clocksource core code, but still
> allow for some reduction in code duplication. Then as hardware
> clocksources are tested for viability (for example, HPET may be a good
> bet here, but ACPI PM would not), we can add that logic to the arch
> specific sched_clock code.
>
> Sound reasonable?
Yes. We could do this.. It keeps the performance level mostly flat while
removing all the cycles to nanosecond shift logic in every
architecture.
>
> Anyway, that was a bit long winded. I apologize. More specific comments
> below:
>
ok.
> >
> > /**
> > - * __get_clock - Finds a specific clocksource
> > - * @name: name of the clocksource to return
> > - *
> > - * Private function. Must hold clocksource_lock when called.
> > - *
> > - * Returns the clocksource if registered, zero otherwise.
> > - * If the @name is null the highest rated clock is returned.
> > - */
> > -static inline struct clocksource * __get_clock(char * name)
> > -{
> > -
> > - if (unlikely(list_empty(&clocksource_list)))
> > - return &clocksource_jiffies;
> > -
> > - if (!name)
> > - return list_entry(clocksource_list.next,
> > - struct clocksource, list);
> > -
> > - return __is_registered(name);
> > -}
> > -
>
> Errr.. Wasn't this function just added in the last patch? Can we reduce
> the churn here a bit?
I tried, but I also have to make each patch compile. So it gets a bit
tricky.
>
> > @@ -952,10 +1061,26 @@ static void update_wall_time(void)
> > clock->xtime_nsec -= (s64)xtime.tv_nsec << clock->shift;
> >
> > /* check to see if there is a new clocksource to use */
> > - if (change_clocksource()) {
> > + if (unlikely(atomic_read(&clock_check))) {
> > +
> > + /*
> > + * Switch to the new override clock, or the highest
> > + * rated clock.
> > + */
> > + if (*clock_override_name)
> > + change_clocksource(clock_override_name);
> > + else
> > + change_clocksource(NULL);
> > +
> > clock->error = 0;
> > clock->xtime_nsec = 0;
> > clocksource_calculate_interval(clock, tick_nsec);
> > +
> > + /*
> > + * Remove the change signal
> > + */
> > + atomic_dec(&clock_check);
> > +
> > }
> > }
> >
> I think this last chunk (changing the clocksource switching logic) has
> some potential. Mind breaking it out into a separate patch?
Maybe, but part of this fell out of reorganizing the code.
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]