Re: 2.6.13-rc3-mm1 (ckrm)

Mark Hahn wrote:

I suspect that the main problem is that this patch is not a mainstream
kernel feature that will gain multiple uses, but rather provides
support for a specific vendor middleware product used by that
vendor and a few closely allied vendors.  If it were smaller or
less intrusive, such as a driver, this would not be a big problem.
That's not the case.
yes, that's the crux. CKRM is all about resolving conflicting resourcedemands in a multi-user, multi-server, multi-purpose machine. this is ahuge undertaking, and I'd argue that it's completely inappropriate for*most* servers. that is, computers are generally so damn cheap thatthe clear trend is towards dedicating a machine to a specific purpose,rather than running eg, shell/MUA/MTA/FS/DB/etc all on a single machine.

The argument about scale-up vs. scale-out is nowhere close to beingresolved. To argue that any support for performance partitioning (whichCKRM does) is in support of a lost cause is premature to say the least.

this is *directly* in conflict with certain prominent products, such asthe Altix and various less-prominent Linux-based mainframes. they're allabout partitioning/virtualization - the big-iron aesthetic of splitting upa single machine. note that it's not just about "big", since cluster-basedapproaches can clearly scale far past big-iron, and are in effect staticallypartitioned. yes, buying a hideously expensive single box, and then choppingit into little pieces is more than a little bizarre, and is mainly based
on a couple assumptions:
- that clusters are hard. really, they aren't. they are notnecessarily higher-maintenance, can be far more robust, usuallydo cost less. just about the only bad thing about clusters isthat they tend to be somewhat larger in size.
- that partitioning actually makes sense. the appeal is that ifyou have a partition to yourself, you can only hurt yourself.but it also follows that burstiness in resource demand cannot beoverlapped without either constantly tuning the partitions orinfringing on the guarantee.

"constantly tuning the partitions" is effectively whats done by workloadmanagers. But our earlier presentations and papers have made the casethat this is not the only utility for performance isolation - simpleneeds like isolating one user from another on a general purpose serveris also a need that cannot be met by any existing or proposed Linuxkernel mechanisms today.

If partitioning made so little sense and the case for clusters was thatobvious, one would be hard put to explain why server consolidation isbeing actively pursued by so many firms, Solaris is bothering withcoming up with Containers and Xen/VMWare getting all this attention.

I don't think the concept of partitioning can be dismissed so easily.

Of course, it must be noted that CKRM only provides performanceisolation not fault isolation. But there is a need for that. WhetherLinux chooses to let this need influence its design is another matter(which I hope we'll also discuss besides the implementation issues).

CKRM is one of those things that could be done to Linux, and will benefit a
few, but which will almost certainly hurt *most* of the community.
let me say that the CKRM design is actually quite good. the issue is whetherthe extensive hooks it requires can be done (at all) in a way which doesnot disporportionately hurt maintainability or efficiency.

If there are suggestions on implementing this better, it'll certainly bevery welcome.

CKRM requires hooks into every resource-allocation decision fastpath:
	- if CKRM is not CONFIG, the only overhead is software maintenance.
	- if CKRM is CONFIG but not loaded, the overhead is a pointer check.
	- if CKRM is CONFIG and loaded, the overhead is a pointer check
	and a nontrivial callback.

but really, this is only for CKRM-enforced limits.  CKRM really wants to
change behavior in a more "weighted" way, not just causing an
allocation/fork/packet to fail. a really meaningful CKRM needs tobe tightly integrated into each resource manager - effecting each scheduler(process, memory, IO, net). I don't really see how full-on CKRM can becompiled out, unless these schedulers are made fully pluggable.

This is a valid point for the CPU, memory and network controllers (I/Ocan be made pluggable quite easily). For the CPU controller in SuSE, theCKRM CPU controller can be turned on and off dynamically at runtime.Exploring a similar option for memory and network (incurring only apointer check) could be explored. Keeping the overhead close to zero forkernel users not interested in CKRM is certainly one of our objectives.

finally, I observe that pluggable, class-based resource _limits_ couldprobably be done without callbacks and potentially with low overhead.but mere limits doesn't meet CKRM's goal of flexible, wide-spread resourcepartitioning within a large, shared machine.

True but only limits are not as useful for general workload management.

regards, mark hahn.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: 2.6.13-rc3-mm1 (ckrm)
  - From: Gerrit Huizenga <gh@us.ibm.com>

References:
- Re: 2.6.13-rc3-mm1 (ckrm)
  - From: Paul Jackson <pj@sgi.com>
- Re: 2.6.13-rc3-mm1 (ckrm)
  - From: Mark Hahn <hahn@physics.mcmaster.ca>

Prev by Date: Re: 2.6.13-rc3-mm1 (ckrm)
Next by Date: Re: 2.6.13-rc3-mm1 (ckrm)
Previous by thread: Re: 2.6.13-rc3-mm1 (ckrm)
Next by thread: Re: 2.6.13-rc3-mm1 (ckrm)
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind]