On Sun, Dec 17, 2006 at 10:23:54AM -0800, Randy Dunlap wrote:
> J.H. wrote:
...
> >The root cause boils down to with git, gitweb and the normal mirroring
> >on the frontend machines our basic working set no longer stays resident
> >in memory, which is forcing more and more to actively go to disk causing
> >a much higher I/O load. You have the added problem that one of the
> >frontend machines is getting hit harder than the other due to several
> >factors: various DNS servers not round robining, people explicitly
> >hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
> >probably several other factors we aren't aware of. This has caused the
> >average load on that machine to hover around 150-200 and if for whatever
> >reason we have to take one of the machines down the load on the
> >remaining machine will skyrocket to 2000+.
Relaying on DNS and clients doing round-robin load-balancing is doomed.
You really, REALLY, need external L4 load-balancer switches.
(And installation help from somebody who really knows how to do this
kind of services on a cluster.)
Basic config features include, of course:
- number of parallel active connections with each protocol
- availability of each served protocol (e.g. one can shutdown rsync
at one server, and new rsync connections get pushed elsewere)
- running load-balance of each served protocol separately
- server load monitoring and letting it bias new connections to nodes
not so utterly loaded
- allowing direct access to each server in addition to the access
via cluster service
- some sort of connection persistence, only for HTTP access ?
(ftp and rsync can do nicely without)
> >Since it's apparent not everyone is aware of what we are doing, I'll
> >mention briefly some of the bigger points.
...
> >- We've cut back on the number of ftp and rsync users to the machines.
> >Basically we are cutting back where we can in an attempt to keep the
> >load from spiraling out of control, this helped a bit when we recently
> >had to take one of the machines down and instead of loads spiking into
> >the 2000+ range we peaked at about 500-600 I believe.
How about having filesystems mounted with "noatime" ?
Or do you already do that ?
> >So we know the problem is there, and we are working on it - we are
> >getting e-mails about it if not daily than every other day or so. If
> >there are suggestions we are willing to hear them - but the general
> >feeling with the admins is that we are probably hitting the biggest
> >problems already.
/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
[Index of Archives]
[Kernel Newbies]
[Netfilter]
[Bugtraq]
[Photo]
[Stuff]
[Gimp]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Video 4 Linux]
[Linux for the blind]
[Linux Resources]