知乎專欄 | 多維度架構 | 微信號 netkiller-ebook | QQ群:128659835 請註明“讀者” |
homepage: http://www.nagios.org/
Nagios 是一種開放原始碼監視軟件,它可以掃瞄主機、服務、網絡方面存在的問題。Nagios 與其他類似的包之間的主要區別在於,Nagios 將所有的信息簡化為“工作(working)”、“可疑的(questionable)”和“故障(failure)”狀態,並且 Nagios 支持由插件組成的非常豐富的“生態系統”。這些特性使得用戶能夠進行有效安裝,在此過程中無需過多地關心細節內容,只提供他們所需的信息即可。
install
$ sudo apt-get install nagios3 nagios-nrpe-plugin
add user nagiosadmin for nagios
$ sudo htpasswd -c /etc/nagios2/htpasswd.users nagiosadmin New password: Re-type new password: Adding password for user nagiosadmin
Create a new nagcmd group for allowing external commands to be submitted through the web interface. Add both the nagios user and the apache user to the group.
$ groupadd nagcmd $ sudo usermod -a -G nagcmd nagios $ sudo usermod -a -G nagcmd www-data $ cat /etc/group nagcmd:x:1003:nagios,www-data
reload apache
$ sudo /etc/init.d/apache2 reload * Reloading web server config apache2 [ OK ]
nagios-nrpe-server --------> nagios core (nagios-nrpe-plugin)
nagios-nrpe-server 的功能是向伺服器發送監控數據, 而伺服器端通過nagios-nrpe-plugin接收監控數據。
sudo apt-get install nagios-nrpe-server nagios-plugins
/etc/nagios/nrpe.cfg
/etc/nagios/nrpe_local.cfg
$ sudo vim /etc/nagios/nrpe_local.cfg allowed_hosts=172.16.1.2 command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10 command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200 command[check_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200 command[check_swap]=/usr/lib/nagios/plugins/check_swap -w 20% -c 10% command[check_all_disks]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -e command[check_disk_root]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p / command[check_disk_home]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /home command[check_sda_iostat]=/usr/lib/nagios/plugins/check_iostat -d sda -w 100 -c 200 command[check_sdb_iostat]=/usr/lib/nagios/plugins/check_iostat -d sdb -w 100 -c 200 # command[check_uri_user]=/usr/lib/nagios/plugins/check_http -I 127.0.0.1 -p 80 -u http://example.com/test/ok.php # command[check_mysql]=/usr/lib/nagios/plugins/check_mysql -H localhost -u root -ppassword test -P 3306
重啟後生效
/etc/init.d/nagios-nrpe-server restart
Nagios 可以監控windows伺服器,需要安裝下面軟件。
NSClient++
http://sourceforge.net/projects/nscplus
Install Nagios & Plugins
[root@database ~]# yum -y install nagios nagios-plugins-all nagios-plugins-nrpe
Create the default Nagios web access user & set a password
# htpasswd -c /etc/nagios/passwd nagiosadmin
Verify default config files
nagios -v /etc/nagios/nagios.cfg
Start Nagios
Start Nagios
Configure it to start on boot
chkconfig --levels 345 nagios on
http://localhost/nagios/
# yum install nrpe nagios-plugins-all allowed_hosts=172.16.1.2 command[check_users]=/usr/lib64/nagios/plugins/check_users -w 5 -c 10 command[check_load]=/usr/lib64/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 command[check_hda1]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1 command[check_zombie_procs]=/usr/lib64/nagios/plugins/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/lib64/nagios/plugins/check_procs -w 150 -c 200 command[check_http]=/usr/lib64/nagios/plugins/check_http -I 127.0.0.1 -p 80 -u http://www.example.com/index.html command[check_swap]=/usr/lib64/nagios/plugins/check_swap -w 20% -c 10% command[check_all_disks]=/usr/lib64/nagios/plugins/check_disk -w 20% -c 10% -e # chkconfig nrpe on # service nrpe start
其實沒有必要安裝所有的監控插件
yum install nrpe -y yum install nagios-plugins-disk nagios-plugins-load nagios-plugins-ping nagios-plugins-procs nagios-plugins-swap nagios-plugins-users -y
$ sudo vim /etc/nagios3/nagios.cfg cfg_dir=/etc/nagios3/hosts cfg_dir=/etc/nagios3/servers cfg_dir=/etc/nagios3/switches cfg_dir=/etc/nagios3/routers admin_email=nagios, neo.chen@example.com
add user neo for nagios
$ sudo htpasswd /etc/nagios3/htpasswd.users neo New password: Re-type new password: Adding password for user neo
# grep default_user_name cgi.cfg #default_user_name=guest # grep authorized cgi.cfg authorized_for_system_information=nagiosadmin authorized_for_configuration_information=nagiosadmin authorized_for_system_commands=nagiosadmin authorized_for_all_services=nagiosadmin authorized_for_all_hosts=nagiosadmin authorized_for_all_service_commands=nagiosadmin authorized_for_all_host_commands=nagiosadmin #authorized_for_read_only=user1,user2
$ sudo vim /etc/nagios3/cgi.cfg authorized_for_all_services=nagiosadmin,neo authorized_for_all_hosts=nagiosadmin,neo
$ sudo vim /etc/nagios3/conf.d/contacts_nagios2.cfg ############################################################################### # contacts.cfg ############################################################################### define contact{ contact_name neo alias Neo service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,r service_notification_commands notify-service-by-email host_notification_commands notify-host-by-email email neo.chen@example.com } ############################################################################### ############################################################################### # # CONTACT GROUPS # ############################################################################### ############################################################################### # We only have one contact in this simple configuration file, so there is # no need to create more than one contact group. define contactgroup{ contactgroup_name admins alias Nagios Administrators members root, neo }
當服務出現w—報警(warning),u—未知(unkown),c—嚴重(critical),r—從異常恢復到正常,在這四種情況下通知聯繫人
當主機出現d-當機(down),u—返回不可達(unreachable),r—從異常情況恢復正常,在這3種情況下通知聯繫人
確認 contact_groups 已經設置
neo@monitor:/etc/nagios3$ grep admins conf.d/generic-host_nagios2.cfg contact_groups admins neo@monitor:/etc/nagios3$ grep admins conf.d/generic-service_nagios2.cfg contact_groups admins
$ sudo vim /etc/nagios3/conf.d/hostgroups_nagios2.cfg define hostgroup { hostgroup_name mysql-servers alias MySQL Servers members * }
$ cat /etc/nagios3/conf.d/generic-service_nagios2.cfg # generic service template definition define service{ name generic-service ; The 'name' of this service template active_checks_enabled 1 ; Active service checks are enabled passive_checks_enabled 1 ; Passive service checks are enabled/accepted parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems) obsess_over_service 1 ; We should obsess over this service (if necessary) check_freshness 0 ; Default is to NOT check service 'freshness' notifications_enabled 1 ; Service notifications are enabled event_handler_enabled 1 ; Service event handler is enabled flap_detection_enabled 1 ; Flap detection is enabled failure_prediction_enabled 1 ; Failure prediction is enabled process_perf_data 1 ; Process performance data retain_status_information 1 ; Retain status information across program restarts retain_nonstatus_information 1 ; Retain non-status information across program restarts notification_interval 0 ; Only send notifications on status change by default. is_volatile 0 check_period 24x7 normal_check_interval 5 retry_check_interval 1 max_check_attempts 4 notification_period 24x7 notification_options w,u,c,r contact_groups admins register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE! }
notification_interval 報警發送間隔,單位分鐘
normal_check_interval 間隔時間
retry_check_interval 重試間隔時間
max_check_attempts 檢查次數,4次失敗後報警
發出警報聲
$ sudo vim /etc/nagios3/cgi.cfg # SOUND OPTIONS # These options allow you to specify an optional audio file # that should be played in your browser window when there are # problems on the network. The audio files are used only in # the status CGI. Only the sound for the most critical problem # will be played. Order of importance (higher to lower) is as # follows: unreachable hosts, down hosts, critical services, # warning services, and unknown services. If there are no # visible problems, the sound file optionally specified by # 'normal_sound' variable will be played. # # # <varname>=<sound_file> # # Note: All audio files must be placed in the /media subdirectory # under the HTML path (i.e. /usr/local/nagios/share/media/). host_unreachable_sound=hostdown.wav host_down_sound=hostdown.wav service_critical_sound=critical.wav service_warning_sound=warning.wav service_unknown_sound=warning.wav normal_sound=noproblem.wav
vim /etc/nagios3/commands.cfg # 'notify-host-by-sms' command definition define command{ command_name notify-host-by-sms command_line /srv/sms/sms $CONTACTPAGER$ "Host: $HOSTNAME$\nState: $HOSTSTATE$\nAddress: $HOSTADDRESS$\nInfo: $HOSTOUTPUT$\n\nDate/Time: $LONGDATETIME$\n" } # 'notify-service-by-sms' command definition define command{ command_name notify-service-by-sms command_line /srv/sms/sms $CONTACTPAGER$ "Service: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddress: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $LONGDATETIME$\n\nAdditional Info:\n\n$SERVICEOUTPUT$" }
sudo vim /etc/nagios3/conf.d/contacts_nagios2.cfg define contact{ contact_name neo alias Neo service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,r service_notification_commands notify-service-by-email, notify-service-by-sms host_notification_commands notify-host-by-email, notify-host-by-sms email neo.chen@example.com pager 13113668899 }
neo@monitor:/etc/nagios3/hosts$ sudo cat www.example.com.cfg define host{ use generic-host ; Inherit default values from a template host_name www.example.com ; The name we're giving to this host alias Some Remote Host ; A longer name associated with the host address 172.16.1.10 ; IP address of the host hostgroups http-servers ; Host groups this host is associated with } # NRPE disk check. define service { use generic-service host_name www.example.com service_description nrpe-disk check_command check_nrpe_1arg!check_all_disks!172.16.1.10 } define service { use generic-service host_name www.example.com service_description nrpe-users check_command check_nrpe_1arg!check_users!172.16.1.10 } define service { use generic-service host_name www.example.com service_description nrpe-swap check_command check_nrpe_1arg!check_swap!172.16.1.10 } define service { use generic-service host_name www.example.com service_description nrpe-procs check_command check_nrpe_1arg!check_total_procs!172.16.1.10 } define service { use generic-service host_name www.example.com service_description nrpe-load check_command check_nrpe_1arg!check_load!172.16.1.10 } define service { use generic-service host_name www.example.com service_description nrpe-zombie_procs check_command check_nrpe_1arg!check_zombie_procs!172.16.1.10 }
vim /etc/nagios3/routers/firewall.cfg define host{ use generic-host; Inherit default values from a template host_name firewall ; The name we're giving to this switch alias Cisco PIX 515E Firewall ; A longer name associated with the switch address 172.16.1.254 ; IP address of the switch hostgroups all,networks ; Host groups this switch is associated with } define service{ use generic-service ; Inherit values from a template host_name firewall ; The name of the host the service is associated with service_description PING ; The service description check_command check_ping!200.0,20%!600.0,60% ; The command used to monitor the service normal_check_interval 5 ; Check the service every 5 minutes under normal conditions retry_check_interval 1 ; Re-check the service every minute until its final/hard state is determined } define service{ use generic-service ; Inherit values from a template host_name firewall service_description Uptime check_command check_snmp!-C public -o sysUpTime.0 }
define service{ use local-service host_name www.example.com service_description Host Alive check_command check-host-alive }
hosts
$ cat /etc/nagios3/hosts/www.example.com.cfg define host{ use generic-host ; Inherit default values from a template host_name www.example.com ; The name we're giving to this host alias Some Remote Host ; A longer name associated with the host address 120.132.14.6 ; IP address of the host hostgroups all,http-servers ; Host groups this host is associated with } define service{ use generic-service ; Inherit default values from a template host_name www.example.com service_description HTTP check_command check_http }
HTTP狀態
neo@monitor:~$ /usr/lib/nagios/plugins/check_http -H www.example.com -I 172.16.0.8 -s "HTTs" HTTP CRITICAL: HTTP/1.1 404 Not Found - string not found - 336 bytes in 0.001 second response time |time=0.000733s;;;0.000000 size=336B;;;0 neo@monitor:~$ /usr/lib/nagios/plugins/check_http -H www.example.com -I 172.16.0.8 -e '404' HTTP OK: Status line output matched "404" - 336 bytes in 0.001 second response time |time=0.000715s;;;0.000000 size=336B;;;0
$ sudo vim /etc/nagios3/hosts/mysql.cfg define host{ use generic-host ; Inherit default values from a template host_name mysql-master.example.com ; The name we're giving to this host alias Some Remote Host ; A longer name associated with the host address 172.16.1.6 ; IP address of the host hostgroups all,mysql-servers ; Host groups this host is associated with } define service{ use generic-service ; Inherit default values from a template host_name mysql-master.example.com service_description MySQL check_command check_mysql_database!user!passwd!database }
檢查命令配置檔案 /etc/nagios-plugins/config/
nagios check_ping命令使用方法
具體如下: -H 主機地址 -w WARNING 狀態: 響應時間(毫秒),丟包率 (%) 閥值 -c CRITICAL狀態: 響應時間(毫秒),丟包率 (%) 閥值 -p 發送的包數 預設5個包 -t 超時時間 預設10秒 -4|-6 使用ipv4|ipv6 地址 預設ipv4
實例:
/usr/lib64/nagios/plugins/check_ping -H 74.125.71.106 -w 100.0,20% -c 200.0,50%
# /usr/lib64/nagios/plugins/check_procs PROCS OK: 75 processes # /usr/lib64/nagios/plugins/check_procs -a mingetty PROCS OK: 6 processes with args 'mingetty' # /usr/lib64/nagios/plugins/check_procs -C crond PROCS OK: 1 process with command name 'crond'
監控如果有用戶登陸就發出警告
# /usr/lib64/nagios/plugins/check_users -w 0 -c 5 USERS WARNING - 1 users currently logged in |users=1;0;5;0
監控用戶上線5
# /usr/lib64/nagios/plugins/check_users -w 5 -c 50 USERS OK - 1 users currently logged in |users=1;5;50;0
命令定義
define command{ command_name check_http_404 command_line /usr/lib/nagios/plugins/check_http -H '$HOSTADDRESS$' -I '$HOSTADDRESS$' -e '404' } define command{ command_name check_http_status command_line /usr/lib/nagios/plugins/check_http -H '$HOSTADDRESS$' -I '$HOSTADDRESS$' -e '$ARG1$' } define command{ command_name check_http_url command_line /usr/lib/nagios/plugins/check_http -H '$HOSTADDRESS$' -I '$HOSTADDRESS$' -u '$ARG1$' }
預設HTTP健康檢查超時時間是10秒,如果你的網站需要更長的時間才能打開可以使用-t參數修改預設Timeout時間
# 'check_http' command definition define command{ command_name check_http command_line /usr/lib/nagios/plugins/check_http -t 30 -H '$HOSTADDRESS$' -I '$HOSTADDRESS$' }
# /srv/nagios/libexec/check_http -H www.163.com HTTP OK: HTTP/1.0 200 OK - 657627 bytes in 1.772 second response time |time=1.771681s;;;0.000000 size=657627B;;;0 $ /usr/lib/nagios/plugins/check_http -H www.example.com -I 172.16.0.8 -s "HTTs" HTTP CRITICAL: HTTP/1.1 404 Not Found - string not found - 336 bytes in 0.001 second response time |time=0.000733s;;;0.000000 size=336B;;;0 $ /usr/lib/nagios/plugins/check_http -H www.example.com -I 172.16.0.8 -e '404' HTTP OK: Status line output matched "404" - 336 bytes in 0.001 second response time |time=0.000715s;;;0.000000 size=336B;;;0
命令參數
check_mysql [-d database] [-H host] [-P port] [-s socket] [-u user] [-p password] [-S] /usr/lib64/nagios/plugins/check_mysql -d dbname -H 202.176.120.10 -P 3306 -u test -p password Uptime: 254264 Threads: 16 Questions: 535110791 Slow queries: 21 Opens: 110 Flush tables: 1 Open tables: 81 Queries per second avg: 2104.547
$ /usr/lib64/nagios/plugins/check_mysql --hostname=172.16.1.5 --port=3306 --username=monitor --password=monitor Uptime: 27001 Threads: 8 Questions: 25280156 Slow queries: 14941 Opens: 1389932 Flush tables: 3 Open tables: 128 Queries per second avg: 936.267
cat >> /usr/lib64/nagios/plugins/check_mysql_replication <<EOF #!/bin/bash declare -a slave_is slave_is=($(mysql -h$1 -umonitor -pxmNhj -e "show slave status\G"|grep Running |awk '{print $2}')) if [ "${slave_is[0]}" = "Yes" -a "${slave_is[1]}" = "Yes" ] then echo "OK - Slave is running" exit 0 else echo "Critical - Slave is error" exit 2 fi EOF
sudo chmod +x /usr/lib64/nagios/plugins/check_mysql_replication /usr/lib64/nagios/plugins/check_mysql_replication 172.16.1.4 Critical - slave is error
vim /etc/nagios-plugins/config/mysql.cfg # 'check_mysql_replication' command definition define command{ command_name check_mysql_replication command_line /usr/lib/nagios/plugins/check_mysql_replication $HOSTADDRESS$ } define command{ command_name check_mysql_replication_host command_line /usr/lib/nagios/plugins/check_mysql_replication '$ARG1$' }
nrpe.cfg
cat >> /usr/lib64/nagios/plugins/check_mysql_replication <<EOF #!/bin/bash declare -a slave_is slave_is=($(mysql -umonitor -pxmNhj -e "show slave status\G"|grep Running |awk '{print $2}')) if [ "${slave_is[0]}" = "Yes" -a "${slave_is[1]}" = "Yes" ] then echo "OK - slave is running" exit 0 else echo "Critical - slave is error" exit 2 fi EOF command[check_mysql_slave]=/usr/lib64/nagios/plugins/check_mysql_replication /usr/local/nagios/libexec/check_nrpe -H 192.168.1.1 /usr/local/nagios/libexec/check_nrpe -H 192.168.1.1 -c check_mysql_replication define service { host_name 192.168.10.232 service_description check_mysql_replication check_period 24x7 max_check_attempts 5 normal_check_interval 3 retry_check_interval 2 contact_groups mygroup notification_interval 5 notification_period 24x7 notification_options w,u,c,r check_command check_nrpe!check_mysql_replication }
$ cat /etc/nagios-plugins/config/disk.cfg # 'check_disk' command definition define command{ command_name check_disk command_line /usr/lib/nagios/plugins/check_disk -w '$ARG1$' -c '$ARG2$' -e -p '$ARG3$' } # 'check_all_disks' command definition define command{ command_name check_all_disks command_line /usr/lib/nagios/plugins/check_disk -w '$ARG1$' -c '$ARG2$' -e } # 'ssh_disk' command definition define command{ command_name ssh_disk command_line /usr/lib/nagios/plugins/check_by_ssh -H '$HOSTADDRESS$' -C '/usr/lib/nagios/plugins/check_disk -w '\''$ARG1$' -c '\''$ARG2$'\'' -e -p '\''$ARG3$'\' } #### # use these checks, if you want to test IPv4 connectivity on IPv6 enabled systems #### # 'ssh_disk_4' command definition define command{ command_name ssh_disk_4 command_line /usr/lib/nagios/plugins/check_by_ssh -H '$HOSTADDRESS$' -C '/usr/lib/nagios/plugins/check_disk -w '\''$ARG1$'\'' -c '\''$ARG2$'\'' -e -p '\''$ARG3$'\' -4 }
WARNING/CRITICAL 報警閥值
-w 10% -c 5% -w 100M -c 50M
-p, --path=PATH, --partition=PARTITION 參數監控路徑,可以一次寫多個參數
$ /usr/lib/nagios/plugins/check_disk -w 10% -c 5% -p / -p /opt -p /boot DISK OK - free space: / 23872 MB (66% inode=92%); /opt 99242 MB (47% inode=93%); /boot 276 MB (63% inode=99%);| /=11767MB;33792;35669;0;37547 /opt=110882MB;199232;210300;0;221369 /boot=160MB;414;437;0;460 $ /usr/lib/nagios/plugins/check_disk -w 100M -c 50M -p / -p /opt -p /boot DISK OK - free space: / 23872 MB (66% inode=92%); /opt 99242 MB (47% inode=93%); /boot 276 MB (63% inode=99%);| /=11768MB;37447;37497;0;37547 /opt=110882MB;221269;221319;0;221369 /boot=160MB;360;410;0;460
-x, --exclude_device=PATH 排除監控路徑
/usr/lib64/nagios/plugins/check_disk -w 10% -c 5% -e -x /bak -x /u01
$ cat disk-smb.cfg # 'check_disk_smb' command definition define command{ command_name check_disk_smb command_line /usr/lib/nagios/plugins/check_disk_smb -H '$ARG1$' -s '$ARG2$' } # 'check_disk_smb_workgroup' command definition define command{ command_name check_disk_smb_workgroup command_line /usr/lib/nagios/plugins/check_disk_smb -H '$ARG1$' -s '$ARG2$' -W '$ARG3$' } # 'check_disk_smb_host' command definition define command{ command_name check_disk_smb_host command_line /usr/lib/nagios/plugins/check_disk_smb -a '$HOSTADDRESS$' -H '$ARG1$' -s '$ARG2$' } # 'check_disk_smb_workgroup_host' command definition define command{ command_name check_disk_smb_workgroup_host command_line /usr/lib/nagios/plugins/check_disk_smb -a '$HOSTADDRESS$' -H '$ARG1$' -s '$ARG2$' -W '$ARG3$' } # 'check_disk_smb_user' command definition define command{ command_name check_disk_smb_user command_line /usr/lib/nagios/plugins/check_disk_smb -H '$ARG1$' -s '$ARG2$' -u '$ARG3$' -p '$ARG4$' -w '$ARG5$' -c '$ARG6$' } # 'check_disk_smb_workgroup_user' command definition define command{ command_name check_disk_smb_workgroup_user command_line /usr/lib/nagios/plugins/check_disk_smb -H '$ARG1$' -s '$ARG2$' -W '$ARG3$' -u '$ARG4$' -p '$ARG5$' } # 'check_disk_smb_host_user' command definition define command{ command_name check_disk_smb_host_user command_line /usr/lib/nagios/plugins/check_disk_smb -a '$HOSTADDRESS$' -H '$ARG1$' -s '$ARG2$' -u '$ARG3$' -p '$ARG4$' } # 'check_disk_smb_workgroup_host_user' command definition define command{ command_name check_disk_smb_workgroup_host_user command_line /usr/lib/nagios/plugins/check_disk_smb -a '$HOSTADDRESS$' -H '$ARG1$' -s '$ARG2$' -W '$ARG3$' -u '$ARG4$' -p '$ARG5$' }
$ /usr/lib/nagios/plugins/check_tcp -H 172.16.1.2 -p 80 TCP OK - 0.000 second response time on port 80|time=0.000369s;;;0.000000;10.000000
$ /usr/lib64/nagios/plugins/check_tcp -H localhost -p 11211 -t 5 -E -s 'stats\r\nquit\r\n' -e 'uptime' -M crit TCP OK - 0.001 second response time on port 11211 [STAT pid 29253 STAT uptime 36088 STAT time 1311100189 STAT version 1.4.5 STAT pointer_size 64 STAT rusage_user 3.207512 STAT rusage_system 50.596308 STAT curr_connections 10 STAT total_connections 97372 STAT connection_structures 84 STAT cmd_get 84673 STAT cmd_set 273 STAT cmd_flush 0 STAT get_hits 84336 STAT get_misses 337 STAT delete_misses 0 STAT delete_hits 0 STAT incr_misses 0 STAT incr_hits 0 STAT decr_misses 0 STAT decr_hits 0 STAT cas_misses 0 STAT cas_hits 0 STAT cas_badval 0 STAT auth_cmds 0 STAT auth_errors 0 STAT bytes_read 49280152 STAT bytes_written 46326517326 STAT limit_maxbytes 4294967296 STAT accepting_conns 1 STAT listen_disabled_num 0 STAT threads 4 STAT conn_yields 0 STAT bytes 1345 STAT curr_items 14 STAT total_items 241 STAT evictions 0 STAT reclaimed 135 END]|time=0.000658s;;;0.000000;5.000000
# /usr/lib64/nagios/plugins/check_tcp -H 192.168.2.1 -p 6379 -t 5 -E -s 'info\r\n' -q 'quit\r\n' -e 'uptime_in_days' -M crit TCP OK - 0.001 second response time on port 6379 [$1043 redis_version:2.4.10 redis_git_sha1:00000000 redis_git_dirty:0 arch_bits:64 multiplexing_api:epoll gcc_version:4.4.6 process_id:21331 uptime_in_seconds:18152153 uptime_in_days:210 lru_clock:1801614 used_cpu_sys:1579.41 used_cpu_user:2279.26 used_cpu_sys_children:54.32 used_cpu_user_children:54.11 connected_clients:2 connected_slaves:1 client_longest_output_list:0 client_biggest_input_buf:0 blocked_clients:0 used_memory:1158016 used_memory_human:1.10M used_memory_rss:1560576 used_memory_peak:1289920 used_memory_peak_human:1.23M mem_fragmentation_ratio:1.35 mem_allocator:jemalloc-2.2.5 loading:0 aof_enabled:0 changes_since_last_save:2 bgsave_in_progress:0 last_save_time:1423107828 bgrewriteaof_in_progress:0 total_connections_received:594376 total_commands_processed:1350747 expired_keys:12199 evicted_keys:0 keyspace_hits:511525 keyspace_misses:124116 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:361 vm_enabled:0 role:master slave0:192.168.6.1,58091,online db0:keys=1913,expires=7]|time=0.000815s;;;0.000000;5.000000
http://exchange.nagios.org/directory/Plugins/Network-Connections,-Stats-and-Bandwidth/check_traffic-2Esh/details
https://github.com/cloved/check_traffic
網卡流量監測
nrpe 插件接收來自nagios-nrpe-server數據報告
cat /etc/nagios3/hosts/host.example.org.cfg define host{ use generic-host ; Inherit default values from a template host_name host.example.org ; The name we're giving to this host alias Some Remote Host ; A longer name associated with the host address 172.16.1.3 ; IP address of the host hostgroups all ; Host groups this host is associated with } # NRPE disk check. define service { use generic-service host_name backup service_description nrpe-disk check_command check_nrpe_1arg!check_all_disks!172.16.1.3 } define service { use generic-service host_name backup service_description nrpe-users check_command check_nrpe_1arg!check_users!172.16.1.3 } define service { use generic-service host_name backup service_description nrpe-swap check_command check_nrpe_1arg!check_swap!172.16.1.3 } define service { use generic-service host_name backup service_description nrpe-procs check_command check_nrpe_1arg!check_procs!172.16.1.3 }
Define windows services that should be monitored.
# Define a host for the Windows machine we'll be monitoring # Change the host_name, alias, and address to fit your situation define host{ use windows-server ; Inherit default values from a template host_name remote-windows-host ; The name we're giving to this host alias Remote Windows Host ; A longer name associated with the host address 192.168.1.4 ; IP address of the remote windows host } define service{ use generic-service host_name remote-windows-host service_description NSClient++ Version check_command check_nt!CLIENTVERSION } define service{ use generic-service host_name remote-windows-host service_description Uptime check_command check_nt!UPTIME } define service{ use generic-service host_name remote-windows-host service_description CPU Load check_command check_nt!CPULOAD!-l 5,80,90 } define service{ use generic-service host_name remote-windows-host service_description Memory Usage check_command check_nt!MEMUSE!-w 80 -c 90 } define service{ use generic-service host_name remote-windows-host service_description C:\ Drive Space check_command check_nt!USEDDISKSPACE!-l c -w 80 -c 90 } define service{ use generic-service host_name remote-windows-host service_description W3SVC check_command check_nt!SERVICESTATE!-d SHOWALL -l W3SVC } define service{ use generic-service host_name remote-windows-host service_description Explorer check_command check_nt!PROCSTATE!-d SHOWALL -l Explorer.exe }
Enable Password Protection
define command{ command_name check_nt command_line $USER1$/check_nt -H $HOSTADDRESS$ -p 12489 -s My2Secure$Password -v $ARG1$ $ARG2$ }
https://code.google.com/p/jmxquery/
wget https://jmxquery.googlecode.com/files/jmxquery-1.3-bin.zip unzip jmxquery-1.3-bin.zip chmod +x check_jmx
<![CDATA[ # ./check_jmx -help Usage: check_jmx [-option...] -U url -O object -A attribute (to query an attribute) or check_jmx [-option...] -U url -O object -M method (to invoke a zero-argument method) or check_jmx -help (to display this help page) Mandatory parameters are: -U JMX URL, for example: "service:jmx:rmi:///jndi/rmi://localhost:1616/jmxrmi" -O Object name to be checked, for example, "java.lang:type=Memory" -A Attribute of the object to be checked, for example, "NonHeapMemoryUsage" (not compatible with -M switch) -M Zero-argument method to be invoked (not compatible with -A switch) Options are: -K <key> Key for compound data, for example, "used" -I <info attribute> Attribute of the object containing information for text output -J <info attribute key> Attribute key for -I attribute compound data, for example, "used" -v[v[v[v]]] Verbatim level controlled as a number of v -w <limit> Warning long value -c <limit> Critical long value -default <value> Use default value if requested object/attribute/method does not exist -username <user name> -password <password> Credentials for JMX Note that if warning level > critical, system checks object attribute value to be LESS THAN OR EQUAL warning, critical If warning level < critical, system checks object attribute value to be MORE THAN OR EQUAL warning, critical
例 6.2.
# ./check_jmx -U service:jmx:rmi:///jndi/rmi://localhost:9012/jmxrmi -O java.lang:type=Memory -A HeapMemoryUsage -K used -I HeapMemoryUsage -J used -vvvv -w 731847066 -c 1045495808 JMX OK - HeapMemoryUsage.used=98617544 | HeapMemoryUsage.used=98617544,committed=514850816;init=536870912;max=7635730432;used=98617544
# ./check_jmx -U service:jmx:rmi:///jndi/rmi://localhost:9012/jmxrmi -O org:type=Spring,name=BackgroundService -A QueueSize -w 10 -c 20 JMX CRITICAL - org:type=Spring,name=BackgroundService