8 mar 2013
in this posting i want to enhance the shinken usage with a few fixes
there exsists a nice documentation for nagios [1] already. i didn’t find much in the shinken wiki, so here is what i did.
#! /bin/bash
#
# js@lastlog.de
# 05/03/2013
#
# This Nagios test plugin was created to demonstrate how it is integrated into shinken
#
PROGNAME=`basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`
REVISION="0.0.1"
. $PROGPATH/utils.sh
print_usage() {
echo "Usage:"
echo " $PROGNAME --help"
echo " $PROGNAME --test"
echo " $PROGNAME --version"
}
print_help() {
print_revision $PROGNAME $REVISION
echo ""
print_usage
echo ""
echo "Nagios test plugin"
echo ""
echo "--test"
echo " Perform a test; in this implementation it either returns: STATE_OK, STATE_CRITICAL, STATE_WARNING or STATE_UNKNOWN "
echo "--help"
echo " Print this help screen"
echo "--version"
echo " Print version and license information"
echo ""
support
}
# Information options
case "$1" in
--help)
print_help
exit $STATE_OK
;;
-h)
print_help
exit $STATE_OK
;;
--version)
print_revision $PROGNAME $REVISION
exit $STATE_OK
;;
-V)
print_revision $PROGNAME $REVISION
exit $STATE_OK
;;
--test)
# STATE_OK
# STATE_WARNING
# STATE_CRITICAL
# STATE_UNKNOWN
# STATE_DEPENDENT
# $(($RANDOM%3))
case $(($RANDOM%5)) in
0)
echo "STATE_OK - $(date)"
exit $STATE_OK
;;
1)
echo "STATE_WARNING - $(date)"
exit $STATE_WARNING
;;
2)
echo "STATE_CRITICAL - $(date)"
exit $STATE_CRITICAL
;;
3)
echo "STATE_DEPENDENT - $(date)"
exit $STATE_DEPENDENT
;;
*)
echo "STATE_UNKNOWN - $(date)"
exit $STATE_UNKNOWN
;;
esac
;;
*)
print_usage
exit $STATE_UNKNOWN
esac
also check:
chown shinken:shinken check_test
chmod 0755 check_test
execute it on a shell, it should return:
root@monitoring-vm:/usr/local/shinken/libexec# ./check_test --test
STATE_CRITICAL - Thu Mar 7 12:34:19 CET 2013
root@monitoring-vm:/usr/local/shinken/libexec# ./check_test --test
STATE_WARNING - Thu Mar 7 12:44:23 CET 2013
root@monitoring-vm:/usr/local/shinken/libexec# ./check_test --test
STATE_UNKNOWN - Thu Mar 7 12:44:23 CET 2013
root@monitoring-vm:/usr/local/shinken/libexec# ./check_test --test
STATE_DEPENDENT - Thu Mar 7 12:44:24 CET 2013
root@monitoring-vm:/usr/local/shinken/libexec# ./check_test --test
STATE_OK - Thu Mar 7 12:44:25 CET 2013
root@monitoring-vm:/usr/local/shinken/libexec# ./check_test --test
STATE_WARNING - Thu Mar 7 12:44:26 CET 2013
append this to commands.cfg
define command {
command_name check_test
command_line $PLUGINSDIR$/check_test $ARG1$
}
define service{
use generic-service
service_description CustomTest
host_name lastlog.de
check_command check_test!"--test"
}
note: use generic-service means that our custom service inherits all attributes from the template generic-service defined in /usr/local/shinken/etc/templates.cfg
restart shinken to test if the new rules work in case of an configuration error, shinken will bail out with:
/etc/init.d/shinken restart
Doing config check
* full result is in /tmp/shinken_checkconfig_result
* ConfigCheck failed: Configuration is incorrect, sorry, I bail out
...fail!
have a look at the webinterface, for every test invocation the status is very likely to change between: STATE_OK, STATE_CRITICAL, STATE_WARNING or STATE_UNKNOWN
if everything is working you should see something like this:
note: [4] might be a good introduction into advanced dependencies of service checks!
thanks for the help to Issif#shinken@irc.freenode.net and [1]
deploy thruk [3] and disable shinken webinterface
wget http://www.thruk.org/files/pkg/v1.64-2/ubuntu12.10/amd64/thruk_1.64-2_ubuntu12.10_amd64.deb
dpkg -i thruk_1.64-2_ubuntu12.10_amd64.deb
apt-get -f install # as listed in the manual at [5]
shinken has livestatus (127.0.0.1:50000, a python process) already enabled by default. so we need to edit thruk_local.cfg
netstat -tulpen
...
tcp 0 0 0.0.0.0:50000 0.0.0.0:* LISTEN 999 35960 6918/python
before making this change i had this problem:
less /var/log/apache2/error.log
...
[Thu Mar 07 15:03:35 2013] [notice] child pid 17670 exit signal Segmentation fault (11)
[Thu Mar 07 15:03:35 2013] [notice] child pid 17671 exit signal Segmentation fault (11)
[Thu Mar 07 15:03:35 2013] [notice] child pid 17672 exit signal Segmentation fault (11)
[Thu Mar 07 15:03:35 2013] [notice] child pid 17673 exit signal Segmentation fault (11)
...
this few lines need to be uncommented (so remove all the # symbols)
############################################
# put your own settings into this file
# settings from this file will override
# those from the thruk.conf
############################################
######################################
# Backend Configuration, enter your backends here
<Component Thruk::Backend>
<peer>
name = External Shinken
type = livestatus
<options>
peer = 127.0.0.1:50000
</options>
</peer>
</Component>
restart thruk and apache2
/etc/init.d/thruk restart
/etc/init.d/apache2 restart
finally visit: http://localhost/thruk: user thrukadmin with password thrukadmin
to make this work i used some parts of the documentation [7] but it it required some serious debugging and my current result looks different to what can be found on http://demo.thruk.org.
basically append the action_url to the generic-service record in the templates.cfg file as shown below:
# Generic service definition template - This is NOT a real service, just a template!
define service{
name generic-service ; The 'name' of this service template
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
# Check part
# By default, there is no check_command here
check_interval 5 ; Check the service every 5 minutes in normal state
retry_interval 1 ; Re-check the service every one minutes until a hard state can be determined
max_check_attempts 2 ; Re-check the service up to 3 times in order to determine its final (hard) state
check_period 24x7 ; The service can be checked at any time of the day
# Notification part
notifications_enabled 1 ; Service notifications are enabled
notification_options w,u,c,r ; Send notifications about warning, unknown, critical, and recovery events
notification_interval 60 ; Re-notify about service problems every hour
notification_period 24x7
# If the contacts and contact_groups options are not set, it will notify host contacts instead
# contact_groups admins
# Advanced options. Change with care
#event_handler_enabled 1
# event_handler super_event_kill_everyone!DIE
flap_detection_enabled 1 ; Flap detection is enabled
check_freshness 0
freshness_threshold 3600
#stalking_options w,c
obsess_over_service 0
#escalations ToLevel2
process_perf_data 1 ; Process perf data, like for PNP
is_volatile 0 ; for log monitoring. See doc for more info about it
# For the WebUI
icon_set server ; can be database, disk, network_service, server
register 0
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$' class='tips' rel='/pnp4nagios/index.php/popup?host=$HOSTNAME$&srv=$SERVICEDESC$
}
i also made both NPCDMOD changes both in shinken-specific.cfg.
change the password for user thrukadmin
pnp4nagios uses a password, make thruk aware of that, see pnp_export in [6]
disable the shinken webinterface in /usr/local/shinken/etc/shinken-specific.cfg
just remove the WebUI from the modules list below
define broker {
broker_name broker-1
data_timeout 120
check_interval 60
modules Livestatus, WebUI, Simple-log, NPCDMOD
port 7772
manage_sub_realms 1
spare 0
timeout 3
address localhost
realm All
max_check_attempts 3
manage_arbiters 1
}
restart shinken
/etc/init.d/shinken restart
verify that the webinterface is not there anymore: http://localhost:7767
it seems that thruk is quite memory hungry. in my current setup (13 hosts / 25 services) about 1gb ram minimum is required.
put shinken webservice (port 7767) behind apache vhosts, thus setup a proxy to make the webservice available via port 80 orr 443, see [2] and even better [8] and [9].
i’ve added these lines to the configuration file:
...
# prevent a forward proxy!
ProxyRequests off
# User-Agent / browser identification is used from the original client
# shinken will then return either the mobile or desktop version of the webpage!
ProxyVia Off
ProxyPreserveHost On
# since on ubuntu it is disabled by default, we have to reenable it here
# i don't want to touch /etc/apache2/mods-enabled/proxy.conf
<Proxy *>
Order deny,allow
Allow from all
</Proxy>
# prevent pnp4nagios from being reverse-proxied
ProxyPass /pnp4nagios !
ProxyPass / http://localhost:7767/
ProxyPassReverse / http://localhost:7767/
</VirtualHost>
as outlined in [10] i think this configuration should be safe as it does not create an open forward proxy since there is ProxyRequests off configured. also the reverse proxy is configured to access only http://localhost:7767/.
this can be verified by this commands:
tcpdump src host 1.2.3.4 and dst port 80
first start telnet
telnet localhost 80
then issue this get (hit return 2 times): GET http://www.google.com HTTP/1.0
this is the trace:
# telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET http://www.google.com HTTP/1.0
HTTP/1.1 303 See Other
Date: Sat, 09 Mar 2013 23:35:11 GMT
Server: PasteWSGIServer/0.5 Python/2.6.5
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Location: http://www.google.com/user/login
Vary: Accept-Encoding
Connection: close
Connection closed by foreign host.
note: there MUST BE no output in the tcpdump shell! i did enable ProxyRequests on for a short test, and then after restarting apache you actually have an open forward proxy which will cause much trouble! for the same request to google one will get a lot of traffic in the tcpdump command then.
thanks VERY MUCH to konkat, Unbelieve and Humbedooh from httpd#irc.freenode.org!
note: one thing that i still don’t understand is that konkat got HTTP/1.1 403 Forbidden where i get a HTTP/1.1 303 See Other.
#links * [1] http://community.spiceworks.com/how_to/show/3773-creating-custom-nagios-plugins-scripts-in-bash * [2] http://www.shinken-monitoring.org/forum/index.php?topic=192.0 * [3] http://www.thruk.org/documentation.html * [4] http://www.shinken-monitoring.org/wiki/setup_advanced_dependencies_in_shinken * [6] http://www.thruk.org/documentation.html#_installation * [7] http://www.shinken-monitoring.org/wiki/use_with_pnp * [8] http://adolfomaltez.wordpress.com/2011/05/27/apache-as-a-reverse-proxy/ * [9] http://andriigrytsenko.net/2011/02/apache-as-reverse-proxy-for-https-server/ * [10] http://httpd.apache.org/docs/2.2/mod/mod_proxy.html#access