Online Monitoring with MonALISA

Post on 01-Feb-2016

37 views 0 download

Tags:

description

Online Monitoring with MonALISA. Dan Protopopescu Glasgow, UK. MonALISA. Is a distributed service able to: collect any type of information from different systems analyze this information in real time take automated decisions and perform actions based on it - PowerPoint PPT Presentation

transcript

Online Monitoring with MonALISA

Online Monitoring with MonALISA

Dan Protopopescu Glasgow, UK

Dan Protopopescu Glasgow, UK

MonALISAMonALISA

Is a distributed service able to: collect any type of information from

different systems analyze this information in real time take automated decisions and perform

actions based on it optimize work flows in complex

environments Read more athttp://monalisa.caltech.edu

UsesUses Monitoring distributed computing, i.e. GRIDs Optimizing flow in complex system (VRVS, optics cable networks) ALICE also uses ML for monitoring online reconstruction Some benchmark figures for the service: ~ 800k monitored parameters at 50k updates/second > 10k running (alien) jobs monitored simultaneously > 100 WAN links

We are proposing ML as a high level monitoring and possible control system along with (or on top of) existing slow controls systems as epics, pvss etc.

AdvantagesAdvantages

MonALISA is simple to install, configure and use ApMon APIs are available in C, C++, Java, Python and Perl ROOT plugin allows macros to send data directly to MonaLISA Can easily interface with (or sit on top of) any existing or

future slow controls subsystem (epics, pvss) Data is stored in a standard PgSQL (or MySQL) database that

can be accessed by other applications, independently of ML Automatic data summarizing Several data repositories (and hence DBs) can exist (local

and remote) Easy access via WebService (WS) from service and/or

repository Fully supported by development team; work is being done in

this direction

CapabilitiesCapabilitiesBased on monitored information, actions can be

taken in: ML Service ML RepositoryActions can be triggered by: Values above/below given thresholds Absence/presence of values Correlations between several valuesPossible actions types: External command Plain event logging Annotation of repository charts; RSS feeds Email Instant messaging

ComponentsComponents

ServiceService ServiceService

RepositoryRepository

LUS/ProxiesLUS/Proxies

ApMonApMon ApMonApMon ApMonApMon

Web ServerWeb Server

ApMonApMon

Actions based on aggregated information

Actions based on aggregated information

Actions based on local informationActions based on local information

Quick actionsQuick actions

GUIGUI

Service setupService setup

ServiceService ServiceService

RepositoryRepository

LUSLUS

ApMonApMon ApMonApMon ApMonApMon

Web ServerWeb Server

ApMonApMon

Actions based on aggregated information

Actions based on aggregated information

Actions based on local informationActions based on local information

Quick actionsQuick actions

ML Service setup:

wget http://nuclear.gla.ac.uk/~protopop/ML/MonaLisa.tar.gztar -zxvf MonaLisa.tar.gzcd MonaLisa/./install.shcd ../MonaLisa/Service/CMD/./MLD start

ML Service setup:

wget http://nuclear.gla.ac.uk/~protopop/ML/MonaLisa.tar.gztar -zxvf MonaLisa.tar.gzcd MonaLisa/./install.shcd ../MonaLisa/Service/CMD/./MLD start

Repository setupRepository setup

ServiceService ServiceService

RepositoryRepository

LUSLUS

ApMonApMon ApMonApMon ApMonApMon

Web ServerWeb Server

ApMonApMon

Actions based on aggregated information

Actions based on aggregated information

Actions based on local informationActions based on local information

Quick actionsQuick actions

ML Repository setup:

wget http://nuclear.gla.ac.uk/~protopop/ML/MLrepository.tgztar -zxvf MLrepository.tgz[configure it]cd MLrepository./start.sh

ML Repository setup:

wget http://nuclear.gla.ac.uk/~protopop/ML/MLrepository.tgztar -zxvf MLrepository.tgz[configure it]cd MLrepository./start.sh

ApMon setupApMon setup

ServiceService ServiceService

RepositoryRepository

LUS/ProxiesLUS/Proxies

ApMonApMon ApMonApMon ApMon

Web ServerWeb Server

ApMonApMon

Actions based on aggregated information

Actions based on aggregated information

Actions based on local informationActions based on local information

Quick actionsQuick actions

ApMon setup:

wget http://nuclear.gla.ac.uk/~protopop/ML/ApMon_perl.tar.gztar -xzvf ApMon_perl.tar.gzcd ApMon_perl[create your script, say mysend.pl]perl mysend.pl

ApMon setup:

wget http://nuclear.gla.ac.uk/~protopop/ML/ApMon_perl.tar.gztar -xzvf ApMon_perl.tar.gzcd ApMon_perl[create your script, say mysend.pl]perl mysend.pl

Simple monitoring scriptSimple monitoring script

ServiceService ServiceService

RepositoryRepository

LUS

ApMonApMon ApMonApMon ApMonApMon

Web ServerWeb Server

ApMonApMon

Actions based on aggregated information

Actions based on aggregated information

Actions based on local informationActions based on local information

Quick actionsQuick actions

[monalisa@glasgow]$ cat mysend.pl

use ApMon;

my $apm = new ApMon({"glasgow.jlab.org:8884" =>{"sys_monitoring" => 0, "general_info" => 0}});

my @pair;while (1) {# loop forever

# get values from somewhere @pair = getmypar(“pspec_logic_ai_0”);

$apm->sendParameters(”Detector", “MOR”, @pair);

sleep (20);}

[monalisa@glasgow]$ cat mysend.pl

use ApMon;

my $apm = new ApMon({"glasgow.jlab.org:8884" =>{"sys_monitoring" => 0, "general_info" => 0}});

my @pair;while (1) {# loop forever

# get values from somewhere @pair = getmypar(“pspec_logic_ai_0”);

$apm->sendParameters(”Detector", “MOR”, @pair);

sleep (20);}

Time historyTime history

ServiceService ServiceService

RepositoryRepository

LUS

ApMonApMon ApMonApMon ApMonApMon

Web ServerWeb Server

ApMonApMon

Actions based on aggregated information

Actions based on aggregated information

Actions based on local informationActions based on local information

Quick actionsQuick actions

Time history example:

[monalisa@glasgow]$ cat mor.properties

page=histFarms=JlabMLClusters=DetectorNodes=MORFunctions=pspec_logic_ai_0ylabel=Tagger ratetitle=MORannotation.groups=2

Time history example:

[monalisa@glasgow]$ cat mor.properties

page=histFarms=JlabMLClusters=DetectorNodes=MORFunctions=pspec_logic_ai_0ylabel=Tagger ratetitle=MORannotation.groups=2

Web interfaceWeb interface

Java GUIJava GUI

Application controlApplication control

Key

Keystore

ML Clients

TCP based subscribe mechanism serialized, compressed objects with optional encryption

ML Proxies

Application commands are encrypted

ML Services

Standard and/or user’s sensors and/or application modules

ML Service

ApMon

YourApplication

Your custom Java client

GUI client

ML Repository

Your monmodule

Yourcustom view

AppMonC

bashYour application

Your appmodule

LUS

Alert-based ActionsAlert-based Actions

MySQL daemon is automatically restartedwhen it runs out of memoryTrigger: threshold on VSZ memory usage ALICE Production jobs queue is automatically

kept full by the automatic resubmissionTrigger: threshold on the number of aliprod waiting jobs

Administrators are kept up-to-date on the services’ statusTrigger: presence/absence of monitored information via instant messaging, RSS feeds, toolbar alerts etc.

SummarySummary

MonALISA is a very promising tool for online experiment monitoring and interfacing with a variety of slow control subsystems; GlueX are seriously considering ML for this task

Easy to configure, understand and use Experience from Grid monitoring and more Support from the developers group for

implementation of new modules/features Online experiment monitoring tests of

CLAS@Jlab were recently carried on; demo repository is at http://mlr1.gla.ac.uk:7002

More examples / ExtrasMore examples / Extras

Integrated Pie ChartsIntegrated Pie Charts

History Plots, Annotations

History Plots, Annotations

AliEn Services MonitoringAliEn Services Monitoring AliEn services

Periodically checked PID check + SOAP call Simple functional

tests SE space usage Efficiency

Job Network Traffic MonitoringJob Network Traffic Monitoring Based on the xrootd

transfer from every job Aggregated statistics for

Sites (incoming, outgoing, site to site, internal)

Storage Elements (incoming, outgoing)

Of Read and written files Transferred MB/s

Individual Job TrackingIndividual Job Tracking

Based on AliEn shell cmds. top, ps, spy, jobinfo, masterjob

Using the GUI ML Client Status, resource usage, per

job

Head Node MonitoringHead Node Monitoring Machine parameters, real-time & history, load, memory & swap usage,

processes, sockets

MonALISA in AliEnMonALISA in AliEn

The MonALISA framework is used as a primary monitoring tool for the ALICE Grid since 2004

Presently the system is used for monitoring of all (identified) services, jobs and network parameters necessary for the Grid operation and debugging

The number of concurrently monitored and stored parameters today is ~ 300.000 in 75 ML Services

The add-on tools for automatic events notification allow for more efficient reaction to problems

The framework design and flexibility answers all requirements for a monitoring system

The accumulated information allows to construct and implement automated decision making algorithms, thus increasing further the efficiency of the Grid operations