Monitoring tool in bash - need suggestions

Hi Guys,

I have written my own monitoring tool to cover some basic needs of mine. Basically what it does is to check the status of a service every 30 seconds and if the service is down it will attempt to start it. In case the service has been down it will write a small log (that doesn’t really help), and will notify me by mail that the service has been down. That way if I receive more emails, I know that manual intervention is required. What I am looking for is for ideas to improve that script. If you think you can help me, please share your ideas. Code snippets are highly appreciated.

#Define Services to monitor:
services=(tor httpd mysql varnish)

while true; do
# Check if Service  is running
for i in ${services[@]}; do
if pgrep "$i" > /dev/null
    echo "$i is running" > /dev/null
#Log the time of stoppage:
    echo "$(date) $i Stopped " | tee -a /root/notify.txt >> /var/log/custom/$i.log
    echo "$(date) Restarting $i: " | tee -a /root/notify.txt >> /var/log/custom/$i.log
#Start the service
    /sbin/service $i start | tee -a /root/notify.txt >> /var/log/custom/$i.log
    /sbin/service $i status | tee -a /root/notify.txt >> /var/log/custom/$i.log
#Send notification

if [ -f /root/notify.txt ]; then
 echo "Subject: Monitoring: Service Down Detected" | cat - /root/notify.txt | /usr/sbin/sendmail -t
#Delete old notification
rm -rf /root/notify.txt
sleep 30

Why do you need the sleep 30. Is there an urgency to monitor every 30 seconds.
If you monitored every minute, you could use crontab to run your script every minute.

Since your script is short running (a second or two), it would be better to have it executed via crontab.
The reason… If for some unknown reason your script was terminated, you would have no way to restart it.
Via crontab, it would start on the next minute.

1 Like

Actually it was my attention to avoid crontab. The cron daemon could be stopped without being restarted. Also, if you want to avoid downtime, you don’t want to wait for a whole minute for a service to be restarted. The script has been started in a background and has not been stopped for over 2 months now. I use it to keep my tor relay up and running :smiley: (wanna win a free t-shirt) :smile:

1 Like

You could possibly get creative using watch to output to log, kicking off scripts based on differences, and/or output to variables.

@Bergwen Can you provide more details how to do this? I was thinking about checking server load and if the load average is beyond certain value, kill PHP processes which are most often the reason for high load.


pgrep is too slow , it is better to check /var/run/*.pid
pgrep reads a lot of information from /proc and select your
requested information

@jalal_hajigholamali Thanks, that is actually a good idea. Will try to implement it.

when you start a service, service creates a file under /var/run/ , when you stop a service, service will remove /var/run/
you can check existence of service by the following script
NOTE: all commands are shell builtin and faster than external command like pgrep

if [ ! -f /var/run/SERVICE.* ]
echo service terminated or did not start…
change SERVICE to real one (for example crond ]

1 Like