Monit

Monit は AGPL3.0 ライセンスのシステム・プロセス監視ツールです (M/Monit とは関係ありません)。Monit を使うことで、クラッシュしたサービスを自動的に再起動したり、標準的なハードウェア (例えば lm_sensors や smartmontools のハードドライブなど) から温度を表示することができます。何か問題が発生したり、繰り返し問題が発生したときにサービスアラートを送信します。コマンドラインから直接アクセスすることも、統合された HTTP(S) サーバーを使ってウェブアプリとして動作させることも可能です。特定のシステムの状態を素早く効率的に収集できます。

インストール

monit パッケージをインストールしてください。また、任意で lm_sensors や smartmontools などのソフトウェアもインストールします。設定を完了したら、monit サービスを起動・有効化してください。

設定

Monit のメイン設定ファイルは /etc/monitrc です。このファイルを編集することもできますが、(ハードドライブの温度や健康状態を取得するために) スクリプトを実行したい場合、include /etc/monit.d/* の最後のディレクティブをアンコメントして /etc/monitrc を保存、/etc/monit.d/ を作成してください。

ノート: Monit を使うには /etc/monitrc ファイル (と /etc/monit.d に保存するファイル) のパーミッションを 0700 にする必要があります。パーミッションが異なっていると Monit を起動できません。

設定構文

Monit で使用される設定構文は非常に読みやすいものになっています。check WHAT の後に if THING condition THEN action が続くのが基本です。設定ファイル内の if, and, with(in), has, us(ing|e), on(ly), then, for, of はあくまで人間が読みやすいように書いてあるだけで、Monit が読み込むときは全て無視されます。

設定サンプル

メールサーバーの設定

set mailserver smtp.myserver.com port 587
        username "MyUser" password "MyPassW0rd"
using tlsv12

メール通知のフォーマット

set mail-format {
      from: Monit@MyServer
   subject: $SERVICE $EVENT at $DATE
   message: Monit $ACTION $SERVICE at $DATE on $HOST: $DESCRIPTION.
}

ノート: 上記の $SERVICE などの変数は一般的に使われる変数ではありません。Monit がアラートの内容などに置き換える変数名になります。

CPU、メモリ、スワップの使用量

check system $HOST
    if loadavg (15min) > 15 for 5 times within 15 cycles then alert
    if memory usage > 80% for 4 cycles then alert
    if swap usage > 20% for 4 cycles then alert

ファイルシステムの使用量

check filesystem rootfs with path /
    if space usage > 90% then alert

check filesystem NFS with path /mnt/nfs_share
    if space usage > 90% then alert

プロセスの監視

check process sshd with pidfile /var/run/sshd.pid
   start program  "systemctl start sshd"
   stop program  "systemctl stop sshd"
   if failed port 22 protocol ssh then restart

check process smbd with pidfile /run/samba/smbd.pid
   group samba
   start program = "/etc/init.d/samba start"
   stop  program = "/etc/init.d/samba stop"
   if failed host 192.168.1.250 port 139 type TCP  then restart
   depends on smbd_bin

check file smbd_bin with path /usr/bin/smbd
   group samba
   if failed permission 755 then unmonitor
   if failed uid root then unmonitor
   if failed gid root then unmonitor

ノート: 上記の samba の例では、最初のブロックにある depends on smbd_bin によって、Samba のテストに smbd プロセスが必須になります。

スクリプトによるハードドライブの健康度と温度の監視

温度

/etc/monit.d/scripts/hdtemp.sh ファイルを作成してください (必要であれば /etc/monit.d/scripts フォルダも):

/etc/monit.d/scripts/hdtemp.sh

#!/bin/sh
HDDTP=`/usr/bin/smartctl -a /dev/sd${1} | grep Temp | awk -F " " '{printf "%d",$10}'`
#echo $HDDTP # for debug only
exit $HDDTP

monitrc or /etc/monit.d/*.monit file

check program SSD-A-Temp with path "/etc/monit.d/scripts/hdtemp.sh a"
    every 5 cycles
    if status > 40 then alert
    group health

check program HDD-B-Temp with path "/etc/monit.d/scripts/hdtemp.sh b"
    every 5 cycles
    if status > 40 then alert
    group health

上記の例では、/etc/monit.d/scripts/hdtemp.sh スクリプトで使用するドライブパスは /dev/sdX となっており、X は check 宣言の最後の文字に置き換わります。同じことは次の例の SMART ヘルスステータスでもしています。

SMART ヘルスステータス

/etc/monit.d/scripts/hdhealth.sh

#!/bin/sh
STATUS=`/usr/bin/smartctl -H /dev/sd${1} | grep overall-health | awk 'match($0,"result:"){print substr($0,RSTART+8,6)}'`
if [ "$STATUS" = "PASSED" ] 
then
    # 1 implies PASSED
    TP=1
else 
    # 2 implies FAILED
    TP=2
fi
#echo $TP # for debug only
exit $TP

monitrc or /etc/monit.d/*.monit file

check program SSD-A-Health with path "/etc/monit.d/scripts/hdhealth.sh a"
    every 120 cycles
    if status != 1 then alert
    group health

check program HDD-B-Health with path "/etc/monit.d/scripts/hdhealth.sh b"
    every 120 cycles
    if status != 1 then alert
    group health

ヒント: The group declaration will cause Monit to display all assigned checks with the same group name (in this case samba) together.

アラートの受信者: グローバルとサブシステム

あらゆる alert 状態で指定されたユーザー、メールアドレスにアラートが送信されるグローバルアラートと、各タイプによってアラートを設定できるサブシステムアラートがあります (例: ネットワークアラートは A に送って、プロセスアラートは B に送る)。複数の宣言を作成することで、グローバル・サブシステムアラートは好きなだけ設定できます。

グローバルアラート

グローバルアラートはサブシステムのチェックの外側に設定します。読みやすいように、メールサーバーの設定と同じところで設定することを推奨します:

SET ALERT email@domain

サブシステムアラート

サブシステムアラートは SET フラグが存在しないことを除けばグローバルアラートとほぼ同じです:

ALERT email@domain

参照