Re: [patch 00/43] ktimer reworked

Hi,

Roman Zippel wrote:

On Thu, 1 Dec 2005, David Lang wrote:
In addition, once you remove the bulk of these uses from the picture (by
makeing them use a new timer type that's optimized for their useage pattern,
the 'unlikly to expire' case) the remainder of the timer users easily fall
into the catagory where the timer is expected to expire, so that code can
accept a performance hit for removing events prior to them going off that
would not be acceptable in a general case version.
Guys, before you continue spreading nonsense, please read carefully Ingosdescription of the timer wheel at http://lwn.net/Articles/156329/ .Let me also refine the statement I made in this mail: the _focus_ ondelivery is complete nonsense.

Must you start every email with inflammatory rhetoric? If you want toknow why you find it difficult to get people to see things your way, thekey is in the above paragraph. In everyday life you don't insult aperson on the street and then ask them for directions.

Yes, the expiry/non-expiry distinction is an approximation and perhapsan oversimplification. However, after insulting others for that, youcontinue with your own oversimplification of the algorithms involved.Following your words, I could say "Roman, before you continue spreadingnonsense, please go back and read your algorithms textbook". Thereality though is that both of you are approximately correct, andneither post deserves to be called nonsense.

The delivery is really not the important part, what is important is the_lifetime_ of the timer. As Ingo said we try to delay as much work aspossible into the future, so that all the work needed for short-livedtimer is basically:
	list_add() + list_del()
This is a constant operation and whether at the end is a callback isunimportant from the perspective of the timer system.

Timeout-style timers imply a short lifetime, independent of theirmaximum expiry time. Regular timers expected to expire can have theirlifetime predicted accurately by looking at their expiry time. Aninterface which gives a hint as to the type of timer allows us topredict the lifetime. Please tell me how this tight relation is nonsense?

You are right in that the lifetime is what is important, but the wholepoint of the ktimer distinction is that by knowing if something is atimer vs a timeout, *we can more accurately predict the lifetime*.

When the timer spends more time in the timer wheel, it has to be movedinto different slots over time, but this not a really expensive operationeither, so e.g. all the work needed with a single cascading step is:
	list_add() + list_del() + list_add() + list_del()
This is still quite cheap and with a single cascading step we cover 2^14jiffies (2^10 for small configurations), which is quite a lot of time andwhether in that time the timer is delivered or not doesn't change abovecost. Another important thing to realize is that this cost is independentof the amount of timers, the per timer cost depends only on the timeoutvalue.

On the other hand, there is a huge difference between *amortized*constant time, and constant time. The cascade falls into the formercategory, and affects latency a great deal. It's cheap *per timer*, butby batching so much work to be done at once, it's not cheap to executethe *cascade operation*. If you are putting important, latencysensitive timers in the same data structure as non-latency-sensitivetimeouts, it is going to hurt accuracy and timeliness. For timeouts wedon't care much, since they rarely cascade; Timers which expire *will*go through all the cascades based on their expiry time, and if there aremany of them, worst-case latency will suffer. In a mathematical sensethen, it's not O(1) and to call it so is incorrect. It's "amortizedO(log(l))", where "l" is the lifetime of the timer, and the minimumresolution is constant.

So let's look at the new timer which uses a rbtree. Its per timer costdoesn't depend on the expiry value, but on the size of the tree instead.All you have to do with the timer is:
	tree_insert() + tree_remove()
This is not a constant operation, with O(log(n)) it grows quite slowly,but in any case it's more expensive than a simple list_add/list_del, thismeans you have to do a number of list operations before it becomes moreexpensive than a single tree operation. The nonconstant cost also meansthe more timer start using the rbtree, the relatively cheaper it becomesto use the timer wheel again.

Thank you for a very good argument about why timeouts shouldn't userbtree, and should continue to use the timer wheel. Nobody disagrees onthis however, and adding ktimers will not force any existing users tochange to the new interface.

The break-even point may now be different on various machines, but I thinkit's safe to assume that two list add/del is at least as cheap and usuallycheaper then a tree add/del. This means timers which run for less than2^14 jiffies are better off using the timer wheel, unless they require thehigher resolution of the new timer system.

Again, putting timeouts on the timer wheel is ideal, since we know theytend to have short lifetimes. Same goes for low-resolution timers whichonly need jiffy accuracy. However, jiffy accuracy doesn't cut it for alot of applications. It is when we add high accuracy that the timerwheel falls down, and requires a different approach. So let's call "dt"the desired resolution of the timer in seconds. Then the timer wheelbecomes "amortized O(log(l/dt)) = O(log(l) + log(1/dt))". When youstart talking about resolutions where dt=25usec, then the timer wheelall the sudden becomes worse than a balanced tree, which is always O(n),independent of resolution.

And that's the whole *point* about how we got here. Let the lowresolution, low lifetime timeouts stay on the timer wheel, and make anew approach that specializes in handling longer lifetime, higherresolution timers. That's ktimers in a nutshell. You seem to bearguing for it rather than against it.

Moving timers away from the timer wheel will also not help with theproblem cases of the timer wheel. If you have a million network timer, acascading step for thousands of timer takes time, but it doesn't changethe cost per timer, we just have to do the work that we were too lazy todo before. In this case it would be better to look into solutions whichavoid generating millions of timer in first place.

Putting timers on an rbtree most definitely helps with the worst-caselatency of the timer wheel. That is an issue that some of us care verydeeply about.

You've brought up the fact that networking shouldn't use lots of timersseveral times in the overall discussion. If you know how to do this,I'm sure you can start sending patches to netdev and show them all howstupid they've been all along. However, more likely you'll just findout that just maybe the networking people really *have* thought aboutthe problem, and the solution they came up with is actually a prettygood one.

At any rate, while you fix up all those "timer-abusing" subsystemsthroughout the kernel, can we just try to improve the timer system inthe meantime?

So can we please stop this likely/unlikely expiry nonsense? It's great ifyou want to tell aunt Tillie about kernel hacking, but it's terribleadvice to kernel programmers. When it comes to choosing a timerimplementation, the delivery is completely and utterly unimportant.

Expected expiry is a simple predictor of expected lifetime. If we knewthe lifetime, we could use that, but expiry is one hint that is easierfor the developer to provide. Really, we want to know "E[l]/dt" (E[] isnotation for expected value), but that's unrealistic to estimate. Whatktimers says is: if it's a timeout (E[l] is low and dt is high), use thetimer wheel, and if its a timer (E[l] is high and dt is low), use anrbtree. In what way is that not a reasonable approach?

Jim Bruce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [patch 00/43] ktimer reworked
  - From: Roman Zippel <zippel@linux-m68k.org>

References:
- Re: [patch 00/43] ktimer reworked
  - From: David Lang <david.lang@digitalinsight.com>
- Re: [patch 00/43] ktimer reworked
  - From: Roman Zippel <zippel@linux-m68k.org>

Prev by Date: Re: Missing break in timedia serial setup.
Next by Date: 2.4.32 Oops in scsi_dispatch_cmd
Previous by thread: Re: [patch 00/43] ktimer reworked
Next by thread: Re: [patch 00/43] ktimer reworked
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]