> On 7/19/05, nodata <fedora@xxxxxxxxxxxx> wrote: >> > On 7/19/05, nodata <fedora@xxxxxxxxxxxx> wrote: >> >> > Hi Guys, >> >> > >> >> > Hope you experts can help me out here. >> >> > >> >> > Basically I have server running at a very high load 2.44, although >> >> > nothing is noticably high when using top. There aren't any >> processes >> >> > running on the box except the standard linux OS tools. This box is >> >> > used for backup, and only becomes active during the night. >> >> > >> >> > Its a compaq dl380 with a raid 5 configuration. >> >> > >> >> > Can anyone suggest what I can do to find out why the load is high? >> >> > >> >> > Thanks for your help in advance. >> >> > >> >> > Dan >> >> > >> >> >> >> I bet you have hanging nfs mounts. >> >> If the box is constantly at a load of around 2.44, and isn't >> sluggish, I >> >> wouldn't worry. >> >> >> >> Look at iostat, sar, etc. to find out why the load is like that. >> >> >> > >> > >> > Hi >> > >> > I've looked at these but can't see anything. The server doesn't mount >> > or export any filesystems using nfs or any other protocol. If it helps >> > here are the various outputs: >> > >> > uptime >> > 14:45:49 up 62 days, 43 min, 2 users, load average: 1.46, 1.57, >> 1.59 >> > >> > sar 5 10 >> > Linux 2.4.21-27.0.4.ELsmp (orion.gs.moneyextra.com) 19/07/05 >> > >> > 14:46:02 CPU %user %nice %system %idle >> > 14:46:07 all 0.00 0.00 0.00 100.00 >> > 14:46:12 all 0.00 0.00 0.10 99.90 >> > 14:46:17 all 0.00 0.00 0.10 99.90 >> > 14:46:22 all 0.00 0.00 0.00 100.00 >> > 14:46:27 all 0.00 0.00 0.00 100.00 >> > 14:46:32 all 0.00 0.00 0.10 99.90 >> > 14:46:37 all 0.00 0.00 0.00 100.00 >> > 14:46:42 all 0.10 0.00 0.31 99.59 >> > 14:46:47 all 0.00 0.00 0.00 100.00 >> > 14:46:52 all 0.00 0.00 0.00 100.00 >> > Average: all 0.01 0.00 0.06 99.93 >> > >> > vmstat -a >> > procs memory swap io system >> > cpu >> > r b swpd free inact active si so bi bo in cs us >> sy >> > wa id >> > 0 0 0 15404 189668 202836 0 0 3 1 0 2 3 >> 4 >> > 1 3 >> > >> > free -m >> > total used free shared buffers >> cached >> > Mem: 498 483 15 0 128 >> 301 >> > -/+ buffers/cache: 53 445 >> > Swap: 1027 0 1027 >> > >> > iostat >> > Linux 2.4.21-27.0.4.ELsmp (orion.gs.moneyextra.com) 19/07/05 >> > >> > avg-cpu: %user %nice %sys %idle >> > 3.11 0.00 3.72 93.17 >> > >> > Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn >> > /dev/ida/c0d0 19.68 427.93 279.15 2147483647 1400883506 >> > /dev/ida/c0d0p1 >> > 0.00 0.22 0.00 1087144 8986 >> > /dev/ida/c0d0p2 >> > 0.65 3.72 10.24 18680778 51401528 >> > /dev/ida/c0d0p3 >> > 0.00 0.00 0.00 248 0 >> > /dev/ida/c0d0p4 >> > 0.00 0.00 0.00 0 0 >> > /dev/ida/c0d0p5 >> > 0.74 3.90 6.88 19570498 34517568 >> > /dev/ida/c0d0p6 >> > 0.00 0.00 0.00 168 0 >> > /dev/ida/c0d0p7 >> > 0.00 0.00 0.00 168 0 >> > /dev/ida/c0d0p8 >> > 18.29 427.93 262.03 2147483647 1314955424 >> > >> > top >> > 14:47:51 up 62 days, 45 min, 2 users, load average: 1.73, 1.61, >> 1.59 >> > 61 processes: 60 sleeping, 1 running, 0 zombie, 0 stopped >> > CPU states: cpu user nice system irq softirq iowait >> idle >> > total 0.4% 0.0% 0.0% 0.0% 0.0% 0.0% >> 99.5% >> > cpu00 0.9% 0.0% 0.0% 0.0% 0.0% 0.0% >> 99.0% >> > cpu01 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% >> 100.0% >> > Mem: 510400k av, 495224k used, 15176k free, 0k shrd, >> 132000k >> > buff >> > 203040k actv, 182824k in_d, 6852k in_c >> > Swap: 1052592k av, 0k used, 1052592k free >> 308668k >> > cached >> > >> > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU >> COMMAND >> > 13100 root 20 0 1092 1092 888 R 0.4 0.2 0:00 0 top >> > 1 root 15 0 512 512 452 S 0.0 0.1 1:18 0 init >> > 2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 >> > migration/0 >> > 3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 >> > migration/1 >> > 4 root 15 0 0 0 0 SW 0.0 0.0 0:00 1 >> keventd >> > 5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 >> > ksoftirqd/0 >> > 6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1 >> > ksoftirqd/1 >> > 9 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 >> bdflush >> > 7 root 15 0 0 0 0 SW 0.0 0.0 70:21 0 >> kswapd >> > 8 root 15 0 0 0 0 SW 0.0 0.0 23:07 1 >> kscand >> > 10 root 15 0 0 0 0 SW 0.0 0.0 3:30 0 >> kupdated >> > 11 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 >> > mdrecoveryd >> > 18 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 >> ahc_dv_0 >> > 19 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 >> > scsi_eh_0 >> > 23 root 15 0 0 0 0 SW 0.0 0.0 2:30 1 >> > kjournald >> > 192 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 >> > kjournald >> > 193 root 15 0 0 0 0 SW 0.0 0.0 13:57 1 >> > kjournald >> > 194 root 15 0 0 0 0 SW 0.0 0.0 4:18 0 >> > kjournald >> > 568 root 15 0 576 576 492 S 0.0 0.1 0:57 0 >> syslogd >> > 572 root 15 0 472 472 408 S 0.0 0.0 0:00 1 >> klogd >> > 582 root 15 0 452 452 388 S 0.0 0.0 5:33 1 >> > irqbalance >> > 599 rpc 15 0 600 600 524 S 0.0 0.1 0:22 0 >> portmap >> > 618 rpcuser 25 0 720 720 644 S 0.0 0.1 0:00 0 >> > rpc.statd >> > 629 root 15 0 400 400 344 S 0.0 0.0 0:18 0 >> mdadm >> > 712 root 15 0 3160 3160 2024 S 0.0 0.6 3:22 1 >> snmpd >> > 713 root 25 0 3160 3160 2024 S 0.0 0.6 0:00 0 >> snmpd >> > 722 root 15 0 1576 1576 1324 S 0.0 0.3 4:58 1 sshd >> > >> > Anyone have any ideas. Literally the box is sitting there not doing >> > anything that has been scheduled. >> > >> > This happens occassionally then the load spontaneously goes down. Do >> > you reckon it has something to do with the raid 5? >> > >> > Thanks >> > Dan >> > >> >> ps auxw | grep " D " >> > Hi, > > I get the following: > > ps auxw | grep " D " > root 15802 0.0 0.1 3688 660 pts/0 S 16:06 0:00 grep D > > Dan > Then it's probably not a problem of waiting for IO. Here are the other codes, you might want to try S or T: PROCESS STATE CODES Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe the state of a process. D Uninterruptible sleep (usually IO) R Running or runnable (on run queue) S Interruptible sleep (waiting for an event to complete) T Stopped, either by a job control signal or because it is being traced. W paging (not valid since the 2.6.xx kernel) X dead (should never be seen) Z Defunct ("zombie") process, terminated but not reaped by its parent.