On 4/14/05, Bob Brennan <rbrennan96@xxxxxxxxx> wrote: > On 4/14/05, replies-lists-redhat@xxxxxxxxxxxxxxxxxxxxx > <replies-lists-redhat@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > > i haven't been following this in great detail, so this may have been > > mentioned already. > > > > if there are issues with high machine load http connections won't > > close. when that happens you'll hit the maxclients level and your > > http server will stop accepting connections. > > > > if you (as appears to be the case) aren't monitoring things like your > > machine load (yet) you can look in your /var/log/maillog file for > > high load hints during this incident. sendmail (but not postfix) will > > stop accepting mail when the load gets above a certain point (default > > is 12 i believe). when this happens it writes that to the maillog > > file. > > I suspected this too since it is a somewhat ram-poor machine and I had > just started up spamassasin which is a known resource-hog. > > I can see all emails that went through plus those that got > spam-bucketed, not a lot (single digits per hour) and no problems > recorded. Mail volume was down for those 20 minutes onlly because most > users depend on squirrelmail which was obviously down at the time. > Test emails from dnsCheck (instigated by me) did get recorded during > the outage. > > If sendmail does in fact log a high load condition then that rules it > out since there is no record. > > > that will give you info on whether the issue was high load. if it > > was, then you should set up some scripts that do monitoring so that > > you can pin point the underlying issue(s). > > > > for monitoring, vmstat, uptime (for the load numbers) and top (in > > batchmode - kicking in only when the load hits some threshold) are > > all very useful. > > > > - Rick > > Update - my daily logwatch recorded a typical number of TCP/IP packets that day, averaging around 800-900 which is an average load and no large numbers from any single IP. So not likely it was any kind of DOS atttack. We've ruled out a lot of things here, but still no clue as to what actually happened. I would lean towards a hardware failure but the machine has been on 24/7 for many months without a hiccup, other services were running normally, and httpd recovered 100% after 20 minutes all by itself. Or never really went away - was just unreachable? Firestarter as a possibility (has been installed for about 1 month)? Thanks in advance for any more ideas, bob