Fedora Users — Re: Weird high load on server

> On 7/19/05, nodata <fedora@xxxxxxxxxxxx> wrote:
>> > On 7/19/05, nodata <fedora@xxxxxxxxxxxx> wrote:
>> >> > Hi Guys,
>> >> >
>> >> > Hope you experts can help me out here.
>> >> >
>> >> > Basically I have server running at a very high load 2.44, although
>> >> > nothing is noticably high when using top. There aren't any
>> processes
>> >> > running on the box except the standard linux OS tools. This box is
>> >> > used for backup, and only becomes active during the night.
>> >> >
>> >> > Its a compaq dl380 with a raid 5 configuration.
>> >> >
>> >> > Can anyone suggest what I can do to find out why the load is high?
>> >> >
>> >> > Thanks for your help in advance.
>> >> >
>> >> > Dan
>> >> >
>> >>
>> >> I bet you have hanging nfs mounts.
>> >> If the box is constantly at a load of around 2.44, and isn't
>> sluggish, I
>> >> wouldn't worry.
>> >>
>> >> Look at iostat, sar, etc. to find out why the load is like that.
>> >>
>> >
>> >
>> > Hi
>> >
>> > I've looked at these but can't see anything. The server doesn't mount
>> > or export any filesystems using nfs or any other protocol. If it helps
>> > here are the various outputs:
>> >
>> >  uptime
>> >  14:45:49  up 62 days, 43 min,  2 users,  load average: 1.46, 1.57,
>> 1.59
>> >
>> > sar 5 10
>> > Linux 2.4.21-27.0.4.ELsmp (orion.gs.moneyextra.com)     19/07/05
>> >
>> > 14:46:02          CPU     %user     %nice   %system     %idle
>> > 14:46:07          all      0.00      0.00      0.00    100.00
>> > 14:46:12          all      0.00      0.00      0.10     99.90
>> > 14:46:17          all      0.00      0.00      0.10     99.90
>> > 14:46:22          all      0.00      0.00      0.00    100.00
>> > 14:46:27          all      0.00      0.00      0.00    100.00
>> > 14:46:32          all      0.00      0.00      0.10     99.90
>> > 14:46:37          all      0.00      0.00      0.00    100.00
>> > 14:46:42          all      0.10      0.00      0.31     99.59
>> > 14:46:47          all      0.00      0.00      0.00    100.00
>> > 14:46:52          all      0.00      0.00      0.00    100.00
>> > Average:          all      0.01      0.00      0.06     99.93
>> >
>> > vmstat -a
>> > procs                      memory      swap          io     system
>> > cpu
>> >  r  b   swpd   free  inact active   si   so    bi    bo   in    cs us
>> sy
>> > wa id
>> >  0  0      0  15404 189668 202836    0    0     3     1    0     2  3
>> 4
>> > 1  3
>> >
>> > free -m
>> >              total       used       free     shared    buffers
>> cached
>> > Mem:           498        483         15          0        128
>> 301
>> > -/+ buffers/cache:         53        445
>> > Swap:         1027          0       1027
>> >
>> > iostat
>> > Linux 2.4.21-27.0.4.ELsmp (orion.gs.moneyextra.com)     19/07/05
>> >
>> > avg-cpu:  %user   %nice    %sys   %idle
>> >            3.11    0.00    3.72   93.17
>> >
>> > Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
>> > /dev/ida/c0d0    19.68       427.93       279.15 2147483647 1400883506
>> > /dev/ida/c0d0p1
>> >                   0.00         0.22         0.00    1087144       8986
>> > /dev/ida/c0d0p2
>> >                   0.65         3.72        10.24   18680778   51401528
>> > /dev/ida/c0d0p3
>> >                   0.00         0.00         0.00        248          0
>> > /dev/ida/c0d0p4
>> >                   0.00         0.00         0.00          0          0
>> > /dev/ida/c0d0p5
>> >                   0.74         3.90         6.88   19570498   34517568
>> > /dev/ida/c0d0p6
>> >                   0.00         0.00         0.00        168          0
>> > /dev/ida/c0d0p7
>> >                   0.00         0.00         0.00        168          0
>> > /dev/ida/c0d0p8
>> >                  18.29       427.93       262.03 2147483647 1314955424
>> >
>> > top
>> >  14:47:51  up 62 days, 45 min,  2 users,  load average: 1.73, 1.61,
>> 1.59
>> > 61 processes: 60 sleeping, 1 running, 0 zombie, 0 stopped
>> > CPU states:  cpu    user    nice  system    irq  softirq  iowait
>> idle
>> >            total    0.4%    0.0%    0.0%   0.0%     0.0%    0.0%
>> 99.5%
>> >            cpu00    0.9%    0.0%    0.0%   0.0%     0.0%    0.0%
>> 99.0%
>> >            cpu01    0.0%    0.0%    0.0%   0.0%     0.0%    0.0%
>> 100.0%
>> > Mem:   510400k av,  495224k used,   15176k free,       0k shrd,
>> 132000k
>> > buff
>> >                     203040k actv,  182824k in_d,    6852k in_c
>> > Swap: 1052592k av,       0k used, 1052592k free
>> 308668k
>> > cached
>> >
>> >   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU
>> COMMAND
>> > 13100 root      20   0  1092 1092   888 R     0.4  0.2   0:00   0 top
>> >     1 root      15   0   512  512   452 S     0.0  0.1   1:18   0 init
>> >     2 root      RT   0     0    0     0 SW    0.0  0.0   0:00   0
>> > migration/0
>> >     3 root      RT   0     0    0     0 SW    0.0  0.0   0:00   1
>> > migration/1
>> >     4 root      15   0     0    0     0 SW    0.0  0.0   0:00   1
>> keventd
>> >     5 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0
>> > ksoftirqd/0
>> >     6 root      34  19     0    0     0 SWN   0.0  0.0   0:00   1
>> > ksoftirqd/1
>> >     9 root      15   0     0    0     0 SW    0.0  0.0   0:00   0
>> bdflush
>> >     7 root      15   0     0    0     0 SW    0.0  0.0  70:21   0
>> kswapd
>> >     8 root      15   0     0    0     0 SW    0.0  0.0  23:07   1
>> kscand
>> >    10 root      15   0     0    0     0 SW    0.0  0.0   3:30   0
>> kupdated
>> >    11 root      25   0     0    0     0 SW    0.0  0.0   0:00   0
>> > mdrecoveryd
>> >    18 root      15   0     0    0     0 SW    0.0  0.0   0:00   0
>> ahc_dv_0
>> >    19 root      25   0     0    0     0 SW    0.0  0.0   0:00   0
>> > scsi_eh_0
>> >    23 root      15   0     0    0     0 SW    0.0  0.0   2:30   1
>> > kjournald
>> >   192 root      15   0     0    0     0 SW    0.0  0.0   0:00   0
>> > kjournald
>> >   193 root      15   0     0    0     0 SW    0.0  0.0  13:57   1
>> > kjournald
>> >   194 root      15   0     0    0     0 SW    0.0  0.0   4:18   0
>> > kjournald
>> >   568 root      15   0   576  576   492 S     0.0  0.1   0:57   0
>> syslogd
>> >   572 root      15   0   472  472   408 S     0.0  0.0   0:00   1
>> klogd
>> >   582 root      15   0   452  452   388 S     0.0  0.0   5:33   1
>> > irqbalance
>> >   599 rpc       15   0   600  600   524 S     0.0  0.1   0:22   0
>> portmap
>> >   618 rpcuser   25   0   720  720   644 S     0.0  0.1   0:00   0
>> > rpc.statd
>> >   629 root      15   0   400  400   344 S     0.0  0.0   0:18   0
>> mdadm
>> >   712 root      15   0  3160 3160  2024 S     0.0  0.6   3:22   1
>> snmpd
>> >   713 root      25   0  3160 3160  2024 S     0.0  0.6   0:00   0
>> snmpd
>> >   722 root      15   0  1576 1576  1324 S     0.0  0.3   4:58   1 sshd
>> >
>> > Anyone have any ideas. Literally the box is sitting there not doing
>> > anything that has been scheduled.
>> >
>> > This happens occassionally then the load spontaneously goes down. Do
>> > you reckon it has something to do with the raid 5?
>> >
>> > Thanks
>> > Dan
>> >
>>
>> ps auxw | grep " D "
>>
> Hi,
>
> I get the following:
>
> ps auxw | grep " D "
> root     15802  0.0  0.1  3688  660 pts/0    S    16:06   0:00 grep  D
>
> Dan
>

Then it's probably not a problem of waiting for IO.

Here are the other codes, you might want to try S or T:

PROCESS STATE CODES
Here are the different values that the s, stat and state output specifiers
(header "STAT" or "S") will display to describe the state of a process.
D    Uninterruptible sleep (usually IO)
R    Running or runnable (on run queue)
S    Interruptible sleep (waiting for an event to complete)
T    Stopped, either by a job control signal or because it is being traced.
W    paging (not valid since the 2.6.xx kernel)
X    dead (should never be seen)
Z    Defunct ("zombie") process, terminated but not reaped by its parent.