Re: [KORG] Re: kernel.org lies about latest -mm kernel

On Sun, Dec 17, 2006 at 10:23:54AM -0800, Randy Dunlap wrote:
> J.H. wrote:
...
> >The root cause boils down to with git, gitweb and the normal mirroring
> >on the frontend machines our basic working set no longer stays resident
> >in memory, which is forcing more and more to actively go to disk causing
> >a much higher I/O load.  You have the added problem that one of the
> >frontend machines is getting hit harder than the other due to several
> >factors: various DNS servers not round robining, people explicitly
> >hitting [git|mirrors|www|etc]1 instead of 2 for whatever reason and
> >probably several other factors we aren't aware of.  This has caused the
> >average load on that machine to hover around 150-200 and if for whatever
> >reason we have to take one of the machines down the load on the
> >remaining machine will skyrocket to 2000+.  

Relaying on DNS and clients doing round-robin load-balancing is doomed.

You really, REALLY, need external L4 load-balancer switches.
(And installation help from somebody who really knows how to do this
kind of services on a cluster.)

Basic config features include, of course:
 - number of parallel active connections with each protocol
 - availability of each served protocol  (e.g. one can shutdown rsync
   at one server, and new rsync connections get pushed elsewere)
 - running load-balance of each served protocol separately
 - server load monitoring and letting it bias new connections to nodes
   not so utterly loaded
 - allowing direct access to each server in addition to the access
   via cluster service
 - some sort of connection persistence, only for HTTP access ?
   (ftp and rsync can do nicely without)

> >Since it's apparent not everyone is aware of what we are doing, I'll
> >mention briefly some of the bigger points.
...
> >- We've cut back on the number of ftp and rsync users to the machines.
> >Basically we are cutting back where we can in an attempt to keep the
> >load from spiraling out of control, this helped a bit when we recently
> >had to take one of the machines down and instead of loads spiking into
> >the 2000+ range we peaked at about 500-600 I believe.

How about having filesystems mounted with "noatime" ?
Or do you already do that ?

> >So we know the problem is there, and we are working on it - we are
> >getting e-mails about it if not daily than every other day or so.  If
> >there are suggestions we are willing to hear them - but the general
> >feeling with the admins is that we are probably hitting the biggest
> >problems already.

/Matti Aarnio
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Follow-Ups:
- Re: [KORG] Re: kernel.org lies about latest -mm kernel
  - From: "J.H." <[email protected]>

References:
- kernel.org lies about latest -mm kernel
  - From: Pavel Machek <[email protected]>
- [KORG] Re: kernel.org lies about latest -mm kernel
  - From: Randy Dunlap <[email protected]>
- Re: [KORG] Re: kernel.org lies about latest -mm kernel
  - From: Andrew Morton <[email protected]>
- Re: [KORG] Re: kernel.org lies about latest -mm kernel
  - From: Randy Dunlap <[email protected]>
- Re: [KORG] Re: kernel.org lies about latest -mm kernel
  - From: "J.H." <[email protected]>
- Re: [KORG] Re: kernel.org lies about latest -mm kernel
  - From: Randy Dunlap <[email protected]>

Prev by Date: [PATCH, RFC] reimplement flush_workqueue()
Next by Date: Re: xfslogd-spinlock bug?
Previous by thread: Re: [KORG] Re: kernel.org lies about latest -mm kernel
Next by thread: Re: [KORG] Re: kernel.org lies about latest -mm kernel
Index(es):
- Date
- Thread

[Index of Archives] [Kernel Newbies] [Netfilter] [Bugtraq] [Photo] [Stuff] [Gimp] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Video 4 Linux] [Linux for the blind] [Linux Resources]