Job queue in bash

All right, I've been busy starting a new job while finishing my Ph.D. during the weekends, but on the plus side, I have tons of little tricks to post. Let's start with this one.

I configured our server to perform a build after each push to our mercurial repositories: test, coverage analysis, documentation generation, email to the team, etc. I wanted to make sure that only one build at a time could occur so I tried to use the at command in Linux, only to realize later that although at had the concept of a job queue, it did not ensure that the jobs were executed one at a time. *

Finally, I found a nice trick in bash to do this. It is not perfect (there is still the possibility that two jobs will run concurrently or that the queue will get corrupted), but for our use case, this is acceptable and quite unlikely.

To create a bash queue, create a file, say /opt/scripts_output/queue, then add this code at the beginning of your script:

towait=`tail -n 1 /opt/scripts_output/queue`  
echo $$ >> /opt/scripts_output/queue

while kill -0 $towait >/dev/null 2>&1; do  
    sleep 1
done  

Essentially,

  1. The bash script reads the pid of the last job.
  2. The script writes its pid to the queue file.
  3. The script waits for the last job to terminate (kill -0) and then proceed.

Because 1 and 2 are not atomic, this could be problematic if you believe that your bash scripts can be called and executed exactly at the same time. One could create a small but unfair lock file to still keep the solution simple, but the ordering of the jobs would not be necessarily preserved.

You can factor this process in a bash function and you can use it in different scripts. As long as all scripts point to the same queue file, you will be (relatively) sure that they are only executed sequentially.

For us, if a problem occurs because of this script, I get an error in my mailbox and I just wait for the next push (or I manually relaunch the build process). This never happened since I wrote this script, but we know it is a possibility. If you want to use this little bash trick for mission-critical systems, that's another story.

* Actually, when a job is started by at, the job goes to another queue and at moves on to the next job and starts it if the time is right. The end result is that all jobs in a queue can run concurrently.