EDDIE - Manual


Installation

Downloading

You need to download the following:

Installing

Follow the QUICKSTART document (also located in the eddie/doc/ directory) or continue with the steps below.


Configuration

Config files

Global configurables

The global configurables are usually in eddie.cf and are listed below:

eddie.cf is well documented, so read through the file and modify the settings to suit your environment.

Configuration format

The EDDIE configuration follows the standard Python code format. Where methods or child objects of an object are indicated by indenting them beneath the parent object definition, sub-objects or parameters of a directive object are similarly indicated by indenting them beneath the parent object definition. For example, a notification object definition may look like:

         N COMMONALERT:

             # Info
             Level 0:
                 email(INFO_EMAIL,INFO)

             # Warning
             Level 1:
                 email(ALERT_EMAIL,WARN)

             # Alert
             Level 2:
                 email(ALERT_EMAIL,ALERT),ticker(ALERT_P)

             # Serious Alert
             Level 3:
                 email(ALERT_EMAIL,ALERT),email(SYSSUP_EMAIL,ALERT_P),ticker(ALERT_P)
         
The parameters and child objects of the parent object, N, are indented. Similarly for the Level objects. If you are used to Python coding this will be second nature to you. If you are not, it will not be hard to pick up.

Similarly, directive definitions are formatted as follows:
             DIRECTIVE name:
                 argument1=value1
                 [argument2=value2
                 ...]
         
where "DIRECTIVE" is the directive name, like PROC or FS, and "name" is the user-defined name of this directive object. The arguments customize the directive appropriately. Some arguments are directive-specific while others are common to all directives. E.g.:
             PROC test:
                procname='syslogd'
                rule=NR
                scanperiod='30s'
                action='COMMONALERT(commonmsg.proc,1)'
         
This is an example definition of a PROC directive, called 'test'. It contains the PROC-specific arguments, 'procname' and 'rule'. 'scanperiod' and 'action' are arguments which are common to all directives. Some arguments are optional while others are required, and errors will be raised if they are missing. In this example 'procname', 'rule' and 'action' are all required. 'scanperiod' is optional.

Simple Configuration

An EDDIE configuration can be simple to get basic monitoring started quickly and made as complicated as required to perform advanced operations. A simple example rules file is shown below to monitor basic services on a host. This rules file, named simple.rules, would be placed in the same directory as eddie.cf and eddie.cf would contain the entry

INCLUDE 'simple.rules' The file simple.rules contains
        # Process checks
	PROC syslogd:
            procname='syslogd'
            rule=NR
            action="email('root', '%(procp)s is not running on %(h)s')"
	PROC inetd:
            procname='inetd'
            rule=NR
            action="email('root', '%(procp)s is not running on %(h)s')"
	PROC sshd:
            procname='sshd'
            rule=NR
            action="email('root', '%(procp)s is not running on %(h)s')"

	# Filesystem checks
	FS root:
            fs='/'
            rule="capac>=90"
            action="email('root', '%(fsf)s over 90%% on %(h)s')"
	FS varlog:
            fs='/var/log'
            rule="capac>=90"
            action="email('root', '%(fsf)s over 90%% on %(h)s')"

        # Service Port checks
	SP smtp_port:
            port='smtp'
            protocol='tcp'
            bindaddr='0.0.0.0'
            action="email('root', '%(spprot)s/%(spport)s on %(h)s is not listening')"
	SP http_port:
            port='http'
            protocol='tcp'
            bindaddr='0.0.0.0'
            action="email('root', '%(spprot)s/%(spport)s on %(h)s is not listening')"

        # System statistics checks
	SYS loadaverage:
            rule="loadavg1 > 3.00"
            scanperiod='1m'
            action="email('root', '%(h)s load-average > 3.00')"
   

Directives

The directives are the configuration commands which tell EDDIE what to do. They are of the form:
   DIRECTIVE name: arg1=value1
		   arg2=value2
		   argn=valuen 
Where "DIRECTIVE" is the name of the directive itself (see Built-in Directives); "name" is a user-defined name of the directive definition (the directive ID is usually constructed as "DIRECTIVE.name", e.g., "FS.root", and will appear in the logs, console, etc); "args" are arguments to define what the directive should do and how it should do it. Some arguments are common to all directives and others are specific to that type of directive.

Common Directive Arguments:

Built-in Directives: The built-in directives are as follows:

Note that there may be many more directives depending on the version of EDDIE or any new or optional directives which may have been added to the distribution.

PROC
PROC-specific arguments:
The PROC directive is used to perform process checks. In the simplest case it is used to check if a process is not running when it should be (or running when it should not be). More complex rules can also be written, using most of the process statistics such as memory-usage, owner, percentage cpu used, running time, etc.
Examples:
        # alert if cron is not running
        PROC cron:
            procname='cron'
            rule=NR
            action='email("alert", "cron is not running on %(h)s")'

        # syslog has a memory leak - alert if using over 50MB
        PROC syslogmem:
            procname='syslogd'
            rule='vsz > 50000'
            action='email("alert", "syslogd using over 50MB")'
    

FS
FS-specific arguments:
The FS directive is used to perform filesystem checks. Rules can be simple or complex, based on the size of the filesystem, amount of space used and available, and percentage full.
Examples:
        # alert if / over 95% full
        FS root:
            fs='/'
            rule='capac > 95'
            action='email("alert", "/ is over 95%% full on %(h)s")'

        # alert if /var has less than 100MB available
        FS var:
            fs='/var'
            rule='avail < 100*1024'
            action='email("alert", "/var has less than 100MB free on %(h)s")'
    

SP
SP-specific arguments:
The SP directive is used to perform checks on listening service ports. These can be either tcp or udp ports. If nothing is currently listening on the given port, protocol and bind address combination, the check has failed and the action(s) will be called.
Examples:
        # alert if nothing listening on http port
        SP http:
            port='http'
            protocol='tcp'
	    bindaddr='0.0.0.0'
            action='email("alert", "http port not bound to on %(h)s")'

        # alert if nothing listening on port 22 on 10.0.0.5
        SP sshport:
            port=22
            protocol='tcp'
	    bindaddr='10.0.0.5'
            action='email("alert", "10.0.0.5:22 not bound to")'
    

PID
PID-specific arguments:
The PID directive is used to perform simple checks using pid files which some program generate. The first check is whether the pid file exists or not, which can often indicate whether the program is running or not; and the second check makes sure the pid found in the pid file is a in the process table.
Examples:
        # alert if the sshd pid file doesn't exist
        PID sshdpid1:
            pid='/var/run/sshd.pid'
	    rule=EX
            action='email("alert", "sshd pid file not found on %(h)s")'

        # alert if the sshd pid doesn't match the process table
        PID sshdpid2:
            pid='/var/run/sshd.pid'
	    rule=PR
            action='email("alert", "sshd pid not in process table on %(h)s")'
    

COM
COM-specific arguments:
The COM directive is a generic directive to be used to perform custom checks that other directives cannot handle. It simply executes the given string via a system() call, and captures the stdout/stderr and return value for testing by a custom rule.
Security note: if EDDIE is run as root, the config files should not be world-writable as obviously directives like COM can execute any commands.
Examples:
	# Check load average (the hard way)
	COM loadavg: cmd="uptime | cut -d, -f4 | awk '{print $3}'"
		rule="float(out) > 6.0"
		action='email("alert", "Load on %(h)s is > 6.0")'

	# Check number of netscapes running
	COM loadavg: cmd="ps -ef | grep netscape | wc -l"
		rule="int(out) > 6.0"
		action='email("alert", "There are %(comout)s netscapes running on %(h)s")'
    

PORT
PORT-specific arguments:
The PORT directive tests remote tcp based services. The simplest test is whether the service is accepting remote connections (when both send and expect are empty strings). The test can be made more complex by defining send and expect with appropriate strings. The send string will be sent to the remote host after connecting, and any reply will be matched against the expect regular expression. Actions are called depending on the match.
Examples:
	# check that 10.0.0.5 is accepting connections on port 80
	PORT webcheck:
		host='10.0.0.5'
		port=80
		send=""
		expect=""
		action="email('alert', 'port 80 not responding on 10.0.0.5')"

	# check that 10.0.0.5 is accepting connections on port 25
	PORT smtpcheck:
		host='www.domain.name'
		port=25
		send="\n"
		expect="220.*"
		action="email('alert', 'port 25 not responding on 10.0.0.5')"
    

IF
IF-specific arguments:
The IF directive provides a mechanism for testing network interfaces. Interfaces listed in "netstat -i" are available for testing. The test can be simply whether the interfaces exists on a host or not; or it can be a more complex rule based on various statistics about that interface.
Examples:
	# alert if eth0 interface has disappeared
	IF ethexists:
		name='eth0'
		rule=NE
		action="email('alert', 'eth0 has disappeared on %(h)s')"

	# alert if input packet errors are greater than 10%
	IF ierrs:
		name='hme0'
		rule="100.0*ierrs/ipkts > 10.0"
		action="email('alert', 'input packet error > 10% on hme0')"
    

NET
NET-specific arguments:
The NET directive provides an interface to the kernel network statistics usually provided by a call to 'netstat -s'. Simple or complex rules can be written using those network statistics.
Example:
	# alert if any UDP input errors
	IF udpinerr:
		rule="udpInErrors > 0"
		action="email('alert', '%(h)s has UDP input errors')"
    

SYS
SYS-specific arguments:
The SYS directive provides an interface to the kernel's system statistics. Simple or complex rules can be written using those system statistics.
Example:
	# alert if 1 minute load average > 2
	IF loadavg1:
		rule="loadavg1 > 2.0"
		action="email('alert', '%(h)s has loadavg1 > 2.0')"
    

STORE
TBA

LOGSCAN
LOGSCAN-specific arguments: Action string variables: The LOGSCAN directive is used to watch files. A regular expression is used to pick out important lines from the file. The simplest action is to have the matched lines emailed to a user.
Example:
	# Email all entries from /var/log/messages to alert every 12 hours.
        LOGSCAN messages:
	    file='/var/log/messages'
            regex='.*'
            scanperiod='12h'
            action='email("alert", %(h)s:%(logscanfile)s", "-- Logscan matched %(logscanlinecount)d lines: --\n%(logscanlines)s")'
    

POP3TIMING
POP3TIMING-specific arguments: Action string variables: The POP3TIMING directive is used to measure the performance of a POP3 server. EDDIE connects to the given POP3 server/port and logs in as the given user, then performs some standard commands before closing the connection. The time taken for each step of the connection are timed and stored in variables to be used by the action(s).
Example:
        POP3TIMING pop3test:
	    server='pop3.domain.com'
            user='fred'
            password='foo'
            action="email('mary', 'host=%(pop3timinghost)s, username=%(pop3timingusername)s, connecttime=%(pop3timingconnecttime)s, authtime=%(pop3timingauthtime)s, listtime=%(pop3timinglisttime)s, retrtime=%(pop3timingretrtime)s')"
    

RADIUS
TBA

CRON
TBA

METASTAT
TBA

PING
TBA

FILE
TBA

Notification and Message objects

Notification objects define levels of actions to be performed. Usually, the higher the level, the more serious the actions will be. Later versions of EDDIE will use notification objects for advanced features like problem escalation.
Message objects define messages to be used in actions like email or paging. They are grouped together to provide a common way to call them from notification objects.


Other Features

Console

EDDIE features a Console facility which provides live information about the active directives via a TCP connection. The TCP port used is set by the CONSOLE_PORT setting in eddie.cf and defaults to port 33343. Set this to 0 to disable this feature.

By default every directive is shown in the Console output in the format "<ID> - <state>". This can be modified with the console directive argument, or the directive not shown at all by setting this argument to None.

Substitution variables available to the console argument string are:

Directive examples:

    # check root filesystem usage
    FS rootfs:    fs='/'
                  rule="capac > 95"
                  action='email("root", "%(fsf)s at %(fscapac)s%%")'
                  console='%(state)s %(fscapac)s%%'

    # email me load average every 5mins
    SYS loadavg5: rule="1"
                  action="email('chris', '%(h)s loadavg5: %(sysloadavg5).02f')"
                  scanperiod='5m'
                  console="loadavg5=%(sysloadavg5).02f"

    # store root filesystem data in RRD (don't show on Console)
    FS root_rrd:  fs='/'
                  rule="1"
                  scanperiod='5m'
                  action='elvinrrd("fs-%(h)s_root", "used=%(fsused)s", "size=%(fssize)s")'
                  console=None

Console example:

    $ telnet localhost 33343
    Trying 127.0.0.1...
    Connected to localhost.
    Escape character is '^]'.
    Eddie Console Gateway
    FS.rootfs - ok 33%
    SYS.loadavg5 - loadavg5=0.14
    Connection closed by foreign host.


Appendix A

Time Definition

The format for specifying time is either:

EDDIE Homepage ]


© Chris Miles 2001