Re: [patch 00/43] ktimer reworked

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Roman Zippel wrote:
On Thu, 1 Dec 2005, David Lang wrote:
In addition, once you remove the bulk of these uses from the picture (by
makeing them use a new timer type that's optimized for their useage pattern,
the 'unlikly to expire' case) the remainder of the timer users easily fall
into the catagory where the timer is expected to expire, so that code can
accept a performance hit for removing events prior to them going off that
would not be acceptable in a general case version.
Guys, before you continue spreading nonsense, please read carefully Ingos 
description of the timer wheel at http://lwn.net/Articles/156329/ .
Let me also refine the statement I made in this mail: the _focus_ on 
delivery is complete nonsense.
Must you start every email with inflammatory rhetoric?  If you want to 
know why you find it difficult to get people to see things your way, the 
key is in the above paragraph.  In everyday life you don't insult a 
person on the street and then ask them for directions.
Yes, the expiry/non-expiry distinction is an approximation and perhaps 
an oversimplification.  However, after insulting others for that, you 
continue with your own oversimplification of the algorithms involved. 
Following your words, I could say "Roman, before you continue spreading 
nonsense, please go back and read your algorithms textbook".  The 
reality though is that both of you are approximately correct, and 
neither post deserves to be called nonsense.
The delivery is really not the important part, what is important is the _lifetime_ of the timer. As Ingo said we try to delay as much work as possible into the future, so that all the work needed for short-lived timer is basically:
	list_add() + list_del()

This is a constant operation and whether at the end is a callback is unimportant from the perspective of the timer system.
Timeout-style timers imply a short lifetime, independent of their 
maximum expiry time.  Regular timers expected to expire can have their 
lifetime predicted accurately by looking at their expiry time.  An 
interface which gives a hint as to the type of timer allows us to 
predict the lifetime.  Please tell me how this tight relation is nonsense?
You are right in that the lifetime is what is important, but the whole 
point of the ktimer distinction is that by knowing if something is a 
timer vs a timeout, *we can more accurately predict the lifetime*.
When the timer spends more time in the timer wheel, it has to be moved into different slots over time, but this not a really expensive operation either, so e.g. all the work needed with a single cascading step is:
	list_add() + list_del() + list_add() + list_del()

This is still quite cheap and with a single cascading step we cover 2^14 jiffies (2^10 for small configurations), which is quite a lot of time and whether in that time the timer is delivered or not doesn't change above cost. Another important thing to realize is that this cost is independent of the amount of timers, the per timer cost depends only on the timeout value.
On the other hand, there is a huge difference between *amortized* 
constant time, and constant time.  The cascade falls into the former 
category, and affects latency a great deal.  It's cheap *per timer*, but 
by batching so much work to be done at once, it's not cheap to execute 
the *cascade operation*.  If you are putting important, latency 
sensitive timers in the same data structure as non-latency-sensitive 
timeouts, it is going to hurt accuracy and timeliness.  For timeouts we 
don't care much, since they rarely cascade; Timers which expire *will* 
go through all the cascades based on their expiry time, and if there are 
many of them, worst-case latency will suffer.  In a mathematical sense 
then, it's not O(1) and to call it so is incorrect.  It's "amortized 
O(log(l))", where "l" is the lifetime of the timer, and the minimum 
resolution is constant.
So let's look at the new timer which uses a rbtree. Its per timer cost doesn't depend on the expiry value, but on the size of the tree instead. All you have to do with the timer is:
	tree_insert() + tree_remove()

This is not a constant operation, with O(log(n)) it grows quite slowly, but in any case it's more expensive than a simple list_add/list_del, this means you have to do a number of list operations before it becomes more expensive than a single tree operation. The nonconstant cost also means the more timer start using the rbtree, the relatively cheaper it becomes to use the timer wheel again.
Thank you for a very good argument about why timeouts shouldn't use 
rbtree, and should continue to use the timer wheel.  Nobody disagrees on 
this however, and adding ktimers will not force any existing users to 
change to the new interface.
The break-even point may now be different on various machines, but I think it's safe to assume that two list add/del is at least as cheap and usually cheaper then a tree add/del. This means timers which run for less than 2^14 jiffies are better off using the timer wheel, unless they require the higher resolution of the new timer system.
Again, putting timeouts on the timer wheel is ideal, since we know they 
tend to have short lifetimes.  Same goes for low-resolution timers which 
only need jiffy accuracy.  However, jiffy accuracy doesn't cut it for a 
lot of applications.  It is when we add high accuracy that the timer 
wheel falls down, and requires a different approach.  So let's call "dt" 
the desired resolution of the timer in seconds.  Then the timer wheel 
becomes "amortized O(log(l/dt)) = O(log(l) + log(1/dt))".  When you 
start talking about resolutions where dt=25usec, then the timer wheel 
all the sudden becomes worse than a balanced tree, which is always O(n), 
independent of resolution.
And that's the whole *point* about how we got here.  Let the low 
resolution, low lifetime timeouts stay on the timer wheel, and make a 
new approach that specializes in handling longer lifetime, higher 
resolution timers.  That's ktimers in a nutshell.  You seem to be 
arguing for it rather than against it.
Moving timers away from the timer wheel will also not help with the problem cases of the timer wheel. If you have a million network timer, a cascading step for thousands of timer takes time, but it doesn't change the cost per timer, we just have to do the work that we were too lazy to do before. In this case it would be better to look into solutions which avoid generating millions of timer in first place.
Putting timers on an rbtree most definitely helps with the worst-case 
latency of the timer wheel.  That is an issue that some of us care very 
deeply about.
You've brought up the fact that networking shouldn't use lots of timers 
several times in the overall discussion.  If you know how to do this, 
I'm sure you can start sending patches to netdev and show them all how 
stupid they've been all along.  However, more likely you'll just find 
out that just maybe the networking people really *have* thought about 
the problem, and the solution they came up with is actually a pretty 
good one.
At any rate, while you fix up all those "timer-abusing" subsystems 
throughout the kernel, can we just try to improve the timer system in 
the meantime?
So can we please stop this likely/unlikely expiry nonsense? It's great if you want to tell aunt Tillie about kernel hacking, but it's terrible advice to kernel programmers. When it comes to choosing a timer implementation, the delivery is completely and utterly unimportant.
Expected expiry is a simple predictor of expected lifetime.  If we knew 
the lifetime, we could use that, but expiry is one hint that is easier 
for the developer to provide.  Really, we want to know "E[l]/dt" (E[] is 
notation for expected value), but that's unrealistic to estimate.  What 
ktimers says is: if it's a timeout (E[l] is low and dt is high), use the 
timer wheel, and if its a timer (E[l] is high and dt is low), use an 
rbtree.  In what way is that not a reasonable approach?
Jim Bruce
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[Index of Archives]     [Kernel Newbies]     [Netfilter]     [Bugtraq]     [Photo]     [Stuff]     [Gimp]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Video 4 Linux]     [Linux for the blind]     [Linux Resources]
  Powered by Linux