L Steffenel ©2008 1
log, syslog, logrotateSNMPtools for monitoring
- Luiz Angelo STEFFENEL -
ASI – Master M2 ASR
L Steffenel ©2008 2
Syslog and Log files
L Steffenel ©2008 3
Outline
Log filesWhat need to be logged
Logging policies
Finding log files
Syslog: the system event loggerhow syslog works
its configuration file
the software that uses syslog
debugging syslog
L Steffenel ©2008 4
What to be logged?
The accounting system
The kernel
Various utilities all produce data that need to be logged
most of the data has a limited useful lifetime, and needs to be summarized, compressed, archived and eventually thrown away
L Steffenel ©2008 5
Logging policies
Throw away all data immediately
Reset log files at periodic intervals
Rotate log files, keeping data for a fixed time
Compress and archive to tape or other permanent media
L Steffenel ©2008 6
Which one to choose
Depends on :how much disk space you have
how security-conscious you are
Whatever scheme you select, regular maintenance of log files should be automated using cron
L Steffenel ©2008 7
Throwing away log files
not recommendsecurity problems ( accounting data and log files provide important evidence of break-ins)
helpful for alerting you to hardware and software problems.
In general, keep one or two monthsin a real world, it may take one or two weeks for SA to realize that site has been compromised by a hacker and need to review the logs
L Steffenel ©2008 8
Throwing away (cont.)
Most sites store each day’s log info on disk, sometimes in a compressed format
These daily files are kept for a specific period of time and then deleted
One common way to implement this policy is called “rotation”
L Steffenel ©2008 9
Rotating log files
Keep backup files that are one day old, two days old, and so on.
logfile, logfile.1 , logfile.2, … logfile.7
Each day rename the files to push older data toward the end of the chain
script to archive three days files
L Steffenel ©2008 10
#! /bin/shcd /var/logmv logfile.2 logfile.3mv logfile.1 logfile.2mv logfile logfile.1cat /dev/null > logfile
Some daemons keep their log files open all the time, this script can’t be used with them. To install a new log file, you must either signal the daemon, or kill and restart it.
L Steffenel ©2008 11
#! /bin/shcd /var/logmv logfile.2.Z logfile.3.Zmv logfile.1.Z logfile.2.Zmv logfile logfile.1cat /dev/null > logfilekill -signal pidcompress logfile.1
signal - appropriate signal for the programwriting the log filepid - process id
L Steffenel ©2008 12
Archiving log files
Some sites must archive all accounting data and log files as a matter of policy, to provide data for a potential audit
Log files should be first rotate on disk, then written to tape or other permanent media
L Steffenel ©2008 13
Finding log files
To locate log files, read the system startup scripts : /etc/rc* or /etc/init.d/*
if logging is turned on when daemons are run
where messages are sent
Some programs handle logging via syslog
check /etc/syslog.conf to find out where this data goes
L Steffenel ©2008 14
Finding log files
Different operating systems put log files in different places:
/var/log/*
/var/cron/log
/usr/adm
/var/adm …
On linux, all the log files are in /var/log directory.
L Steffenel ©2008 15
Outline
Log filesWhat need to be logged
Logging policies
Finding log files
Syslog: the system event loggerhow syslog works
its configuration file
debugging syslog
the software that uses syslog
L Steffenel ©2008 16
What is syslog
A comprehensive logging system, used to manage information generated by the kernel and system utilities.
Allow messages to be sorted by their sources and importance, and routed to a variety of destinations:
log files, users’ terminals, or even other machines.
L Steffenel ©2008 17
Syslog: three parts
Syslogd and /etc/syslog.conf the daemon that does the actual logging
its configuration file
openlog, syslog, closeloglibrary routines that programs use to send data to syslogd
loggeruser-level command for submitting log entries
L Steffenel ©2008 18
syslog-aware programs
Using syslog lib. Routineswrite log entries to a special file
/dev/log
syslogd /etc/syslog.conf reads consults
dispatches
Logfiles
Users’sterminals
Other machines
/dev/klog
L Steffenel ©2008 19
Configuring syslogd
The configuration file /etc/syslog.conf controls syslogd’s behavior.
It is a text file with simple format, blank lines and lines beginning with ‘#’ are ignored.
Selector <TAB> action
mail.info /var/log/maillog
L Steffenel ©2008 20
Configuration file - selector
Identify source -- the program (‘facility’) that is sending a log message
importance -- the messages’s severity level
eg. mail.info /var/log/maillog
Syntaxfacility.level
facility names and severity levels must chosen from a list of defined values
L Steffenel ©2008 21
Configuration file - Facility names
Facility Programs that use itkern The kerneluser User process, default if not specifiedmail The mail systemdaemon System daemonsauth Security and authorization related
commandslpr The BSD line printer spooling systemnews The Usenet news system
L Steffenel ©2008 22
Facility Programs that use ituucp Reserved for UUCPcron The cron daemonmark Timestamps generated at regular intervalslocal0-7 Eight flavors of local messagesyslog Syslog internal messagesauthpriv Private or system authorization messagesftp The ftp daemon, ftpd* All facilities except “mark”
Configuration file - Facility names
L Steffenel ©2008 23
Timestamps can be used to log time at regular intervalsby default, every 20 minutes
So you can figure out that your machine crashed between 3:00 and 3:20 am, not just “sometime last night”.
This can be a big help if debugging problems occur on a regular basis.
Configuration file - Facility names
L Steffenel ©2008 24
Configuration file - severity level
Level Approximate meaningemerg (panic) Panic situationalert Urgent situationcrit Critical conditionerr Other error conditionswarning Warning messagesnotice Unusual things that may need
investigationinfo Informational messagesdebug For debugging
L Steffenel ©2008 25
Configuration file - selector
Can include multiple facilities separated with ‘,’ commas
daemon,auth,mail.level action
Multiple selector can be combined with ‘;’daemon.level1; mail.level2 action
Selector are ‘|’ --ORed together, a message matching any selector will be subject to the action.Can contain ‘*’ or ‘none’, meaning all or nothing.
L Steffenel ©2008 26
Configuration file - selector
Levels indicate the minimum importance that a message must have in order to be logged
mail.warning, would match all the messages from mail system, at the minimum level of warning
Level of ‘none’ will excludes the listed facilities regardless of what other selectors on the same line may say.
*.level1;mail.none action all the facilities, except mail, at the minimum level 1 will subject to action
L Steffenel ©2008 27
Configuration file - action
(Tells what to do with a message)
Action Meaningfilename Write message to a file on the
local machine@hostname Forward message to the syslogd on
hostname@ipaddress Forward message to the host at IP address user1, user2,… Write message to users’ screens if they
are logged in* Write message to all users logged in
L Steffenel ©2008 28
Configuration file - action
If a filename action used, the filename must be absolute path. The file must exist, syslogd will not create it.
/var/log/messages
If a hostname is used, it must be resolved via a translation mechanism such as DNS or NISWhile multiple facilities and levels are allowed in a selector, multiple actions are not allowed.
L Steffenel ©2008 29
Config file examples
# Small network or stand-alone syslog.conf file# emergencies: tell everyone who is logged on*.emerg *
# important messages*.warning;daemon,auth.info /var/adm/messages
# printer errorslpr.debug /var/adm/lpd-errs
L Steffenel ©2008 30
# network client, typically forwards serious messages to# a central logging machine# emergencies: tell everyone who is logged on*.emerg;user.none *
#important messages, forward to central logger*.warning;lpr,local1.none @netloghostdaemon,auth.info @netloghost
# local stuff to central logger toolocal0,local2,local7.debug @netloghost
# card syslogs to local1 - to boulderlocal1.debug @boulder.colorado.edu
# printer errors, keep them locallpr.debug /var/adm/lpd-errs
# sudo logs to local2 - keep a copy herelocal2.info /var/adm/sudolog
L Steffenel ©2008 31
Sample syslog output
Dec 27 02:45:00 x-wing netinfod [71]: cann’t lookup child
Dec 27 02:50:00 bruno ftpd [27876]: open of pid file failed: not a directory
Dec 27 02:50:47 anchor vmunix: spurious VME interrupt at processor level 5
Dec 27 02:52:17 bruno pingem[107]: moose.cs.colorado.edu has not answered 34 times
Dec 27 02:55:33 bruno sendmail [28040] : host name/address mismatch: 192.93.110.26 != bull.bull.fr
L Steffenel ©2008 32
Syslog ‘s functions
Liberate programmers from the tedious mechanics of writing log files
Put SA in control of loggingbefore syslog, SA had no control over what info was kept or where it was stored.
Can centralize the logging for a network system
L Steffenel ©2008 33
Syslogd (cont.)
A hangup signal (HUP, signal 1) cause syslogd to close its log files, reread its configuration file, and start logging again.
If you modify the syslog.conf file, you must HUP syslogd to make your changes take effect.
kill -1 pid
L Steffenel ©2008 34
Software that uses syslog
Program Facility Levels Descriptionamd auth err-info NFS automounterdate auth notice Display and set dateftpd daemon err-debug ftp daemongated daemon alert-info Routing daemonapache daemon err Internet info serverhalt/reboot auth crit Shutdown programslogin/rlogind auth crit-info Login programslpd lpr err-info BSD line printer daemon
L Steffenel ©2008 35
Software that uses syslog
Program Facility Levels Descriptionnamed daemon err-info Name sever (DNS)passwd auth err Password setting
programssendmail mail debug-alert Mail transport systemrwho daemon err-notice romote who daemonsu auth crit, notice substitute UID prog.sudo local2 notice, alert Limited su programsyslogd syslog,mark err-info internet errors,
timestamps
L Steffenel ©2008 36
Final words
On linux, check following files:/etc/syslog.conf : syslog configuration file
/etc/logrotate.conf : logging policy, rotate
/etc/logrotate.d/*
/var/log/* : log files
try following commands to find out more...man logrotate
man syslogd
L Steffenel ©2008 37
SNMP
L Steffenel ©2008 38
Overview
IntroductionManagement Information Base(MIB)Simple Network Management Protocol (SNMP)SNMP CommandsTools- ‘SNMPwalk’ (CLI)- ‘MIB Browser’ (GUI)
L Steffenel ©2008 39
Introduction
(1) SNMP- Application-layer protocol for managing TCP/IP based networks.- Runs over UDP, which runs over IP
(2) NMS (Network Management Station)- Device that pools SNMP agent for info.
(3) SNMP Agent- Device (e.g. Router) running software that understands SNMP language
(4) MIB- Database of info conforming to SMI.
(5) SMI Structure of Management Information- Standard that defines how to create a MIB.
L Steffenel ©2008 40
MIB – Management Information Base
MIB Breakdown…
- OBJECT-TYPE- String that describes the MIB object.- Object IDentifier (OID).
- SYNTAX- Defines what kind of info is stored in
the MIB object.- ACCESS
- READ-ONLY, READ-WRITE.- STATUS
- State of object in regards the SNMP community.
- DESCRIPTION- Reason why the MIB object exists.
Standard MIB Object:
sysUpTime OBJECT-TYPESYNTAX Time-TicksACCESS read-onlySTATUS mandatoryDESCRIPTION
“Time since the network management portion of the system was last re-initialised.
::= {system 3}
L Steffenel ©2008 41
MIB – Management Information Base
Object IDentifier (OID)
- Example .1.3.6.1.2.1.1
- iso(1) org(3) dod(6) internet(1) mgmt(2)
mib-2(1) system(1)
Note:- .1.3.6.1 ~100% present.- mgmt and private most common.- MIB-2 successor to original MIB.- STATUS ‘mandatory’, All or nothing in group
1
3
6
1
1
2 3
4
1
1
2 4
6
iso(1)
org(3)
dod(6)
internet(1)
directory(1)
mgmt(2) experimental(3)
private(4)
mib-2(1)
system(1)
interfaces(2) ip(4)
tcp(6)
L Steffenel ©2008 42
MIB – Management Information Base
system(1) group
- Contains objects that describe some basic information on an entity.
- An entity can be the agent itself or the network object that the agent is on.
1
1
2
mib-2(1)
system(1)
interfaces(2)
system(1) group objects - sysDescr(1) Description of the entity. - sysObjectID(2) Vendor defined OID string. - sysUpTime(3) Time since net-mgt was last re-initialised. - sysContact(4) Name of person responsible for the entity.
L Steffenel ©2008 43
MIB – Management Information Base
MIB - tree view MIB - syntax view
1
1
mib-2(1)
system(1)
1
2
3
4
sysDesc(1)
sysObjectID(2)
sysUpTime(3)
sysUpTime OBJECT-TYPESYNTAX INTEGERACCESS read-onlySTATUS mandatoryDESCRIPTION
“The time (in hundredths of a second) since thenetwork management portion of the system was last re-initialized.”
::= {system 3}
sysContact(3)
L Steffenel ©2008 44
MIB – Management Information Base
SNMP Instances- Each MIB object can have an instance.
- A MIB for a router’s (entity) interface information…
iso(1) org(3) dod(6) internet(1) mgmt(2) mib-2(1) interfaces(2) ifTable(2) ifEntry(1) ifType(3)
- Require one ifType value per interface (e.g. 3)- One MIB object definition can represent multiple instances
through Tables, Entries, and Indexes.
L Steffenel ©2008 45
MIB – Management Information Base
Tables, Entries, and Indexes.- Imagine tables as spreadsheets…
- Three interface types require 3 rows (index no.s)- Each column represents a MIB object, as defined by
the entry node.
ifType(3)
Index #2Index #3
Index #1
ifMtu(4) Etc…
ifType.3:[15]ifType.2:[9]ifType.1[6] ifMtu.1
ifMtu.2ifMtu.3
ENTRY + INDEX = INSTANCE
L Steffenel ©2008 46
MIB – Management Information Base
Example MIB Query…- If we queried the MIB on ifType we could get:
- ifType.1 : 6- ifType.2 : 9- ifType.3 : 15
Which corresponds to…
- ifType.1 : ethernet- ifType.2 : tokenRing- ifType.3 : fddi
ifType OBJECT-TYPESYNTAX INTEGER {other(1),ethernet(6),tokenRing(9)fddi(15),…}etc…
L Steffenel ©2008 47
Simple Network Management Protocol
Retrieval protocol for MIB.Can retrieve by
- CLI (snmpwalk), - GUI (MIB Browser), or - Larger applications (Sun Net Manager) called Network
Management Software (NMS).
NMS collection of smaller applications to manage network with illustrations, graphs, etc. NMS run on Network Management Stations (also NMS), which can run several different NMS software applications.
L Steffenel ©2008 48
SNMP Commands
SNMP has 5 different functions referred to as Protocol Data Units (PDU’s), which are:
(1) GetRequest, aka Get
(2) GetNextRequest, aka GetNext
(3) GetResponse, aka Response
(4) SetRequest, aka Set
(5) Trap
L Steffenel ©2008 49
SNMP Commands [Get]
GetRequest [Get]- Most common PDU.- Used to ask SNMP agent for value of a particular MIB agent.- NMS sends out 1 Get PDU for each instance, which is a unique
OID string.- What happens if you don’t know how many instances of a MIB
object exist?
L Steffenel ©2008 50
SNMP Commands [GetNext]
GetNextRequest [GetNext]- NMS application uses GetNext to ‘walk’ down a table
within a MIB.- Designed to ask for the OID and value of the MIB
instance that comes after the one asked for.- Once the agent responds the NMS application can
increment its count and generate a GetNext.- This can continue until the NMS application detects
that the OID has changed, i.e. it has reached the end of the table.
L Steffenel ©2008 51
SNMP Commands [GetResponse]
GetResponse [Response]- Simply a response to a Get, GetNext or Set.- SNMP agent responds to all requests or commands
via this PDU.
L Steffenel ©2008 52
SNMP Commands [SetRequest]
SetRequest [Set]- Issued by an NMS application to change a MIB
instance to the variable within the Set PDU.- For example, you could issue a
- GetRequest against a KDEG server asking for sysLocation.0 and may get ‘ORI’ as the response.
- Then, if the server was moved, you could issue a Set against that KDEG server to change its location to ‘INS’.
- You must have the correct permissions when using the set PDU.
L Steffenel ©2008 53
SNMP Commands [Trap]
Trap- Asynchronous notification.- SNMP agents can be programmed to send a trap
when a certain set of circumstances arise.- Circumstances can be view as thresholds, i.e. a trap
may be sent when the temperature of the core breaches a predefined level.
L Steffenel ©2008 54
SNMP Security
SNMP Community Strings (like passwords)- 3 kinds:
- READ-ONLY: You can send out a Get & GetNext to the SNMP agent, and if the agent is using the same read-only string it will process the request.
- READ-WRITE: Get, GetNext, and Set. If a MIB object has an ACCESS value of read-write, then a Set PDU can change the value of that object with the correct read-write community string.
- TRAP: Allows administrators to cluster network entities into communities. Fairly redundant.
L Steffenel ©2008 55
SNMP Tools
Command Line Interfacee.g. ‘snmpwalk’
Graphical User Interfacee.g. iReasoning’s MIB Browser
via www.ireasoning.com
L Steffenel ©2008 56
SNMP – MIB Browser (1)
Initial set-up...
Breakdown…
- LHS is the SNMP MIB structure.
- Lower LHS has details of MIB structure.
- RHS will present MIB values.
L Steffenel ©2008 57
SNMP – MIB Browser (2)
Discovery…
- Subnet: 134.XXX.XXX.*
- Read Community: public
StartNote IP Address. Stop
L Steffenel ©2008 58
SNMP – MIB Browser (3)
Navigation…
- MIB Tree System
sysUpTime
-Notice Lower LHS
- Notice OID
L Steffenel ©2008 59
SNMP – MIB Browser (4)
SNMP PDU’s…(1) Get
- Select ‘Go’ ‘Get’
- RHS has values.
- OID – Value
L Steffenel ©2008 60
SNMP – MIB Browser (5)
SNMP PDU’s…(2) GetNext
-Selected OID is:.1.3.6.1.2.1.1.5
-Returned value:(.1.3.6.1.2.1.1.6)or“DSG, O’Reilly Institute, F.35”
L Steffenel ©2008 61
SNMP – MIB Browser (6)
SNMP…(3) Get SubTree
-Position of MIB:.1.3.6.1.2.1.1(a.k.a. system)
-RHS values:Returns all values below system.
L Steffenel ©2008 62
SNMP – MIB Browser (7)
SNMP…(4) Walk
-MIB Location:.1.3.6.1.2.1(a.k.a. mib-2)
- Returns *ALL* values under mib-2
L Steffenel ©2008 63
SNMP – MIB Browser (8)
Tables…
- MIB Location:.1.3.6.1.2.1.2.2
(or interfaces)
- Select ifTable, Go, then Table View.
- Refresh/Poll
L Steffenel ©2008 64
SNMP – MIB Browser (9)
SNMP…- Graph
- Select a value from the RHS, say sysUpTime
- Highlight and select ‘Go’, then ‘Graph’.
- Interval = 1s set.
L Steffenel ©2008 65
MRTG/RRDToolGanglia
L Steffenel ©2008 66
MRTG…
• The Multi Router Traffic Grapher (MRTG) is a tool to monitor the traffic load on network-links. MRTG generates HTML pages containing PNG images which provide an almost live visual representation of this traffic. – Check http://oss.oetiker.ch/mrtg/ to see what it
does.
• MRTG has been the most common network traffic measurement tool for all Service Providers
• MRTG uses simple SNMP queries on a regular
interval to generate graphs
L Steffenel ©2008 67
MRTG…
• External readers for MRTG graphs can create other interpretation of data.
• MRTG software can be used not only to measure network traffic on interfaces, but also build graphs of anything that has an equivalent SNMP MIB - like CPU load, Disk availability, Temperature, etc...
• Data sources can be anything that provides a
counter or gauge value – not necessarily SNMP.– For example, graphing round trip times
• MRTG can be extended to work with RRDTool
L Steffenel ©2008 68
Running MRTG
Get the required packages
Compile and install the packages
Make cfg files for router interfaces with cfgmaker
Create html pages from the cfg files with indexmaker
Trigger MRTG periodically from Cron or run it in daemon mode
L Steffenel ©2008 69
L Steffenel ©2008 70
• Round Robin Database for time series data storage• Command line based• From the author of MRTG• Made to be faster and more flexible• Includes CGI and Graphing tools, plus APIs• Solves the Historical Trends and Simple Interface
problems
RRDtool
L Steffenel ©2008 71
Define Data Sources (Inputs)
DS:speed:COUNTER:600:U:U DS:fuel:GAUGE:600:U:U
● DS = Data Source● speed, fuel = “variable” names● COUNTER, GAUGE = variable type● 600 = heart beat – UNKNOWN returned for interval if
nothing received after this amount of time● U:U = limits on minimum and maximum variable values
(U means unknown and any value is permitted)
L Steffenel ©2008 72
Define Archives (Outputs) RRA:AVERAGE:0.5:1:24 RRA:AVERAGE:0.5:6:10
● RRA = Round Robin Archive● AVERAGE = consolidation function● 0.5 = up to 50% of consolidated points may be UNKNOWN
● 1:24 = this RRA keeps each sample (average over one 5 minute primary sample), 24 times (which is 2 hours worth)
● 6:10 = one RRA keeps an average over every six 5 minute primary samples (30 minutes), 10 times (which is 5 hours worth)
• Clear as mud!● all depends on original step size which defaults to 5 minutes
L Steffenel ©2008 73
RRDtool Database FormatRecent data stored once every 5
minutes for the past 2 hours (1:24)
Medium length data averaged to one entry per half hour for the last 5 hours (6:10)
Old data averaged to one entry per day for the last 365 days
(288:365)
--step 300
(5 minute input step
size)RRA 1:24
RRA 6:10
RRA 288:365
RRD
File
L Steffenel ©2008 74
Isn't it simple ?!
• rrdtool create /var/nagios/rrd/host0_load.rrd -s 600 DS:1MIN-Load:GAUGE:1200:0:100 DS:5MIN-Load:GAUGE:1200:0:100 DS:15MIN-Load:GAUGE:1200:0:100 RRA:AVERAGE:0.5:1:50400 RRA:AVERAGE:0.5:60:43800
• rrdtool create /var/nagios/rrd/host0_disk_usage.rrd -s 600 DS:root:GAUGE:1200:0:U DS:home:GAUGE:1200:0:U DS:usr:GAUGE:1200:0:U DS:var:GAUGE:1200:0:U RRA:AVERAGE:0.5:1:50400 RRA:AVERAGE:0.5:60:43800
• rrdtool create /var/nagios/rrd/apricot-INTL_Ping.rrd -s 300 DS:ping:GAUGE:600:0:U RRA:AVERAGE:0.5:1:50400 RRA:AVERAGE:0.5:60:43800
• rrdtool create /var/nagios/rrd/host0_total.rrd -s 300 DS:IN:COUNTER:1200:0:U DS:OUT:COUNTER:600:0:U RRA:AVERAGE:0.5:1:50400 RRA:AVERAGE:0.5:60:43800
L Steffenel ©2008 75
Ping Latency Graph
L Steffenel ©2008 76
Agenda Ganglia Monitoring
Introduction and Overview
Ganglia Architecture
Apache Web Frontend
Gmond & Gmetad
Extending Ganglia
GMetrics
Module Development
L Steffenel ©2008 77
Introduction and Overview
Scalable Distributed Monitoring System
Targeted at monitoring clusters and grids
Multicast-based Listen/Announce protocol
Depends on open standards
XML
XDR compact portable data transport
RRDTool - Round Robin Database
APR – Apache Portable Runtime
Apache HTTPD Server
PHP based web interface
http://ganglia.sourceforge.net or http://www.ganglia.info
L Steffenel ©2008 78
Ganglia Architecture
Gmond – Metric gathering agent installed on individual servers
Gmetad – Metric aggregation agent installed on one or more specific task oriented servers
Apache Web Frontend – Metric presentation and analysis server
Attributes
Multicast – All gmond nodes are capable of listening to and reporting on the status of the entire cluster
Failover – Gmetad has the ability to switch which cluster node it polls for metric data
Lightweight and low overhead metric gathering and transport
Ported to various different platforms (Linux, FreeBSD, Solaris, others)
L Steffenel ©2008 79
Ganglia Architecture
Failover
Poll
Poll
Failover Failover
Poll Poll
GMOND Node
GMOND Node
GMOND Node
Cluster 1
GMOND Node
GMOND Node
GMOND Node
Cluster 2
GMETAD
Cluster 3
GMOND Node
GMOND Node
GMOND Node
GMETAD
Apache Web
Frontend Web Client
L Steffenel ©2008 80
Ganglia Web Frontend
Built around Apache HTTPD server using mod_php
Uses presentation templates so that the web site “look and feel” can be easily customized
Presents an overview of all nodes within a grid vs all nodes in a cluster
Ability to drill down into individual nodes
Presents both textual and graphical views
L Steffenel ©2008 81
Ganglia Customized Web Front-end
L Steffenel ©2008 82
Deploying Ganglia Monitoring
See http://ganglia.sourceforge.net/docs/ganglia.htmlInstall Gmond on all monitored nodes
Edit the configuration fileAdd cluster and host information
Configure network upd_send_channel, udp_recv_channel, tcp_accept_channel
Start gmond
Installing Gmetad on an aggregation nodeEdit the configuration file
Add data and failover sources
Add grid name
Start gmetad
Installing the web frontendInstall Apache httpd server with mod_phpCopy Ganglia web pages and PHP code to appropriate locationAdd appropriate authentication configuration for access control
L Steffenel ©2008 83
Gmond Gathering & Gmetad Aggregation Agents
L Steffenel ©2008 84
Gmond – Metric Gathering Agent
Built-in metrics
Various CPU, Network I/O, Disk I/O and Memory
Extensible
Gmetric – Out-of-process utility capable of invoking command line based metric gathering scripts
Loadable modules capable of gathering multiple metrics or using advanced metric gathering APIs
Built on the Apache Portable Runtime
Supports Linux, FreeBSD, Solaris and more…
L Steffenel ©2008 85
Gmond – Metric Gathering Agent
Automatic discovery of nodesAdding a node does not require configuration file changes
Each node is configured independently
Each node has the ability to listen to and/or talk on the multicast channel
Can be configured for unicast connections if desired
Heartbeat metric determines the up/down status
Thread poolsCollection threads – Capable of running specialized functions for gathering metric data
Multicast listeners – Listen for metric data from other nodes in the same cluster
Data export listeners – Listen for client requests for cluster metric data
L Steffenel ©2008 86
Gmond – Metric Collection Groups
Specify as many collection groups as you like
Each collection group must contain at least one metric section
List available metrics by invoking “gmond -m”
Collection_group section:
collect_once – Specifies that the group of static metrics
collect_every – Collection interval (only valid for non-static)
time_threshold – Max data send interval
Metric section:
Name – Metric name (see “gmond –m”)
Value_threshold – Metric variance threshold (send if exceeded)
L Steffenel ©2008 87
Gmetad – Metric Aggregation Agent
Polls a designated cluster node for the status of the entire cluster
Data collection thread per clusterAbility to poll gmond or another gmetad for metric data
Failover capabilityRRDTool – Storage and trend graphing tool
Defines fixed size databases that hold data of various granularityCapable of rendering trending graphs from the smallest granularity to the largest (eg. Last hour vs last year)Never grows larger than the predetermined fixed size
Database granularity is configurable through gmetad.conf
L Steffenel ©2008 88
Gmetad – Configuration
Data source and and failover designationsdata_source "my cluster" [polling interval] address1:port addreses2:port ...
RRD database storage definitionRRAs "RRA:AVERAGE:0.5:1:244" "RRA:AVERAGE:0.5:24:244" "RRA:AVERAGE:0.5:168:244" "RRA:AVERAGE:0.5:672:244" "RRA:AVERAGE:0.5:5760:374"
Access controltrusted_hosts address1 address2 … DN1 DN2 …
all_trusted OFF/on
RRD files locationrrd_rootdir "/var/lib/ganglia/rrds"
Networkxml_port 8651
interactive_port 8652
L Steffenel ©2008 89
Gmetad – Configuration Example
data_source "my cluster" 10 localhost my.machine.edu:8649 1.2.3.5:8655data_source "my grid" 50 1.3.4.7:8655 grid.org:8651 grid-backup.org:8651data_source "another source" 1.3.4.7:8655 1.3.4.8
trusted_hosts 127.0.0.1 169.229.50.165 my.gmetad.orgxml_port 8651interactive_port 8652
rrd_rootdir "/var/lib/ganglia/rrds"