Help needed with hanging bash script

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bash gurus,

I have a bash script that monitors a directory for files. Whenever it finds files in this directory, it passes them to a support script for processing. The support script moves the files to another directory prior to processing them, and it is run in the background to prevent blocking the main script. A simplified version of the main script loop follows:

 # Execute once every 10 seconds
 while true;
 do
    # Fork a background script to process each file in the spool directory
    for fname in `ls /spool/dir/*.ext 2> /dev/null`
    do
       bname=`basename $fname`

       bg_script $bname &
    done

    sleep 10
 done

This is pretty simple and it worked flawlessly for over a year on a dual processor server running Fedora Core 3. However, after upgrading to an 8 core (2 CPUs x 4 cores) server running Fedora Core 6 the script hangs a few times a week. This is a bad thing, so I have to keep a close eye on the server until the bug is resolved.

The process tree of the script when it's hanging follows:

 [root@server ~]# ps axjf
  PPID   PID  PGID   SID TTY      TPGID STAT   UID   TIME COMMAND
1 3512 3510 2302 ? -1 S 0 0:59 /bin/bash /usr/local/bin/script 3512 21432 3510 2302 ? -1 R 0 40:50 \_ /bin/bash /usr/local/bin/script

Note that the parent process (PID 3512) is sleeping and has accumulated relatively little CPU time since boot. The child process (PID 21432) is running in a hard loop and top shows that it is consuming 100% of one of the cores. It also never terminates, so it permanently blocks the parent process. If the child process is killed, the execution of the parent process restarts without any problems.

The interesting thing is that the script never calls itself. It only calls the support script as a background job. I'm not an expert on the inner workings of bash, but I believe that the child process is a temporary artifact of the fork-exec call sequence used to run the commands in the parent. It seems that a copy of the existing process is created, but it is never overwritten with the child process.

I researched the logs and I'm fairly confident that the script is hanging at the top of the for loop, presumably after exhausting the list created by the "ls" command. There is nothing interesting about the "ls" command itself, as there are usually less than 20 files in the directory it's listing.

I'd appreciate any replies from anyone who has experienced this problem. I have some ideas for working around it, but I'd like to actually understand its cause and how to properly resolve it so that I don't get stuck on something similar in the future.

Thank you,

Matthew Roth
InterMedia Marketing Solutions
Software Engineer and Systems Developer


[Index of Archives]     [Current Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux