Hello!
> the context-switch argument i'll believe if i see numbers. You'll
> probably need in excess of tens of thousands of irqs/sec to even be able
> to measure its overhead. (workqueues are driven by nice kernel threads
> so there's no TLB overhead, etc.)
It was authors of the patch who were supposed to give some numbers,
at least one or two, just to prove the concept. :-)
According to my measurements (maybe, wrong) on 2.5GHz P4 tasklet
schedule and execution eats ~300ns, workqueue eats ~4usec.
On my 1.8GHz PM notebook (UP kernel), the numbers are 170ns and 1.2usec.
Formally looking awful, this result is positive: tasklets are almost
never used in hot paths. I am sure only about one such place: acenic
driver uses tasklet to refill rx queue. This generates not more than
3000 tasklet schedules per second. Even on P4 it pure workqueue schedule
will eat ~1% of bare cpu ticks.
Anyway, all the uses of tasklet should be verified:
The most dubios place is popular Neterion 10Gbit driver, which uses
tasklet like acenic. But at 10Gbit, multiply acenic numbers and panic. :-)
Also, there exists some hardware which uses tasklets even harder,
but I have no idea what real frequencies are: f.e. sundance.
The case with acenic/s2io is quite special: normally network drivers
refill queues in irq handlers. It was Jes Sorensen observation
that offloading refilling from irq improves performance, I do not
remember numbers. Probably, switching to workqueues will not affect
performance at all, probably it will just collapse, no idea.
> ... workqueues are also possibly much more scalable
I cannot figure out - scale in what direction? :-)
> (percpu workqueues
> are easy without changing anything in your code but the call where you
> create the workqueue).
I do not see how it is related to scalability. And the statement
does not even make sense. The patch already uses per-cpu workqueue
for tasklets, otherwise it would be a disaster: guaranteed cpu non-locality.
Tasklet is single thread by definition and purpose. Those a few places
where people used tasklets to do per-cpu jobs (RCU f.e.) exist just because
they had troubles with allocating new softirq. Workqueues do not make
any difference: tasklet is not workqueue, it is work_struct, and you
still will have to allocate array of per-cpu work structs, everything
remains the same.
> the only remaining argument is latency:
You could set realtime prioriry by default, not a poor nice -5.
If some network adapters were killed just because I run some task
with nice --22, it would be just ridiculous.
Alexey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]