+ All Categories
Home > Documents > Online Monitoring with MonALISA

Online Monitoring with MonALISA

Date post: 01-Feb-2016
Category:
Upload: vartan
View: 37 times
Download: 0 times
Share this document with a friend
Description:
Online Monitoring with MonALISA. Dan Protopopescu Glasgow, UK. MonALISA. Is a distributed service able to: collect any type of information from different systems analyze this information in real time take automated decisions and perform actions based on it - PowerPoint PPT Presentation
Popular Tags:
24
Online Monitoring with MonALISA Dan Protopopescu Glasgow, UK
Transcript
Page 1: Online Monitoring with MonALISA

Online Monitoring with MonALISA

Online Monitoring with MonALISA

Dan Protopopescu Glasgow, UK

Dan Protopopescu Glasgow, UK

Page 2: Online Monitoring with MonALISA

MonALISAMonALISA

Is a distributed service able to: collect any type of information from

different systems analyze this information in real time take automated decisions and perform

actions based on it optimize work flows in complex

environments Read more athttp://monalisa.caltech.edu

Page 3: Online Monitoring with MonALISA

UsesUses Monitoring distributed computing, i.e. GRIDs Optimizing flow in complex system (VRVS, optics cable networks) ALICE also uses ML for monitoring online reconstruction Some benchmark figures for the service: ~ 800k monitored parameters at 50k updates/second > 10k running (alien) jobs monitored simultaneously > 100 WAN links

We are proposing ML as a high level monitoring and possible control system along with (or on top of) existing slow controls systems as epics, pvss etc.

Page 4: Online Monitoring with MonALISA

AdvantagesAdvantages

MonALISA is simple to install, configure and use ApMon APIs are available in C, C++, Java, Python and Perl ROOT plugin allows macros to send data directly to MonaLISA Can easily interface with (or sit on top of) any existing or

future slow controls subsystem (epics, pvss) Data is stored in a standard PgSQL (or MySQL) database that

can be accessed by other applications, independently of ML Automatic data summarizing Several data repositories (and hence DBs) can exist (local

and remote) Easy access via WebService (WS) from service and/or

repository Fully supported by development team; work is being done in

this direction

Page 5: Online Monitoring with MonALISA

CapabilitiesCapabilitiesBased on monitored information, actions can be

taken in: ML Service ML RepositoryActions can be triggered by: Values above/below given thresholds Absence/presence of values Correlations between several valuesPossible actions types: External command Plain event logging Annotation of repository charts; RSS feeds Email Instant messaging

Page 6: Online Monitoring with MonALISA

ComponentsComponents

ServiceService ServiceService

RepositoryRepository

LUS/ProxiesLUS/Proxies

ApMonApMon ApMonApMon ApMonApMon

Web ServerWeb Server

ApMonApMon

Actions based on aggregated information

Actions based on aggregated information

Actions based on local informationActions based on local information

Quick actionsQuick actions

GUIGUI

Page 7: Online Monitoring with MonALISA

Service setupService setup

ServiceService ServiceService

RepositoryRepository

LUSLUS

ApMonApMon ApMonApMon ApMonApMon

Web ServerWeb Server

ApMonApMon

Actions based on aggregated information

Actions based on aggregated information

Actions based on local informationActions based on local information

Quick actionsQuick actions

ML Service setup:

wget http://nuclear.gla.ac.uk/~protopop/ML/MonaLisa.tar.gztar -zxvf MonaLisa.tar.gzcd MonaLisa/./install.shcd ../MonaLisa/Service/CMD/./MLD start

ML Service setup:

wget http://nuclear.gla.ac.uk/~protopop/ML/MonaLisa.tar.gztar -zxvf MonaLisa.tar.gzcd MonaLisa/./install.shcd ../MonaLisa/Service/CMD/./MLD start

Page 8: Online Monitoring with MonALISA

Repository setupRepository setup

ServiceService ServiceService

RepositoryRepository

LUSLUS

ApMonApMon ApMonApMon ApMonApMon

Web ServerWeb Server

ApMonApMon

Actions based on aggregated information

Actions based on aggregated information

Actions based on local informationActions based on local information

Quick actionsQuick actions

ML Repository setup:

wget http://nuclear.gla.ac.uk/~protopop/ML/MLrepository.tgztar -zxvf MLrepository.tgz[configure it]cd MLrepository./start.sh

ML Repository setup:

wget http://nuclear.gla.ac.uk/~protopop/ML/MLrepository.tgztar -zxvf MLrepository.tgz[configure it]cd MLrepository./start.sh

Page 9: Online Monitoring with MonALISA

ApMon setupApMon setup

ServiceService ServiceService

RepositoryRepository

LUS/ProxiesLUS/Proxies

ApMonApMon ApMonApMon ApMon

Web ServerWeb Server

ApMonApMon

Actions based on aggregated information

Actions based on aggregated information

Actions based on local informationActions based on local information

Quick actionsQuick actions

ApMon setup:

wget http://nuclear.gla.ac.uk/~protopop/ML/ApMon_perl.tar.gztar -xzvf ApMon_perl.tar.gzcd ApMon_perl[create your script, say mysend.pl]perl mysend.pl

ApMon setup:

wget http://nuclear.gla.ac.uk/~protopop/ML/ApMon_perl.tar.gztar -xzvf ApMon_perl.tar.gzcd ApMon_perl[create your script, say mysend.pl]perl mysend.pl

Page 10: Online Monitoring with MonALISA

Simple monitoring scriptSimple monitoring script

ServiceService ServiceService

RepositoryRepository

LUS

ApMonApMon ApMonApMon ApMonApMon

Web ServerWeb Server

ApMonApMon

Actions based on aggregated information

Actions based on aggregated information

Actions based on local informationActions based on local information

Quick actionsQuick actions

[monalisa@glasgow]$ cat mysend.pl

use ApMon;

my $apm = new ApMon({"glasgow.jlab.org:8884" =>{"sys_monitoring" => 0, "general_info" => 0}});

my @pair;while (1) {# loop forever

# get values from somewhere @pair = getmypar(“pspec_logic_ai_0”);

$apm->sendParameters(”Detector", “MOR”, @pair);

sleep (20);}

[monalisa@glasgow]$ cat mysend.pl

use ApMon;

my $apm = new ApMon({"glasgow.jlab.org:8884" =>{"sys_monitoring" => 0, "general_info" => 0}});

my @pair;while (1) {# loop forever

# get values from somewhere @pair = getmypar(“pspec_logic_ai_0”);

$apm->sendParameters(”Detector", “MOR”, @pair);

sleep (20);}

Page 11: Online Monitoring with MonALISA

Time historyTime history

ServiceService ServiceService

RepositoryRepository

LUS

ApMonApMon ApMonApMon ApMonApMon

Web ServerWeb Server

ApMonApMon

Actions based on aggregated information

Actions based on aggregated information

Actions based on local informationActions based on local information

Quick actionsQuick actions

Time history example:

[monalisa@glasgow]$ cat mor.properties

page=histFarms=JlabMLClusters=DetectorNodes=MORFunctions=pspec_logic_ai_0ylabel=Tagger ratetitle=MORannotation.groups=2

Time history example:

[monalisa@glasgow]$ cat mor.properties

page=histFarms=JlabMLClusters=DetectorNodes=MORFunctions=pspec_logic_ai_0ylabel=Tagger ratetitle=MORannotation.groups=2

Page 12: Online Monitoring with MonALISA

Web interfaceWeb interface

Page 13: Online Monitoring with MonALISA

Java GUIJava GUI

Page 14: Online Monitoring with MonALISA

Application controlApplication control

Key

Keystore

ML Clients

TCP based subscribe mechanism serialized, compressed objects with optional encryption

ML Proxies

Application commands are encrypted

ML Services

Standard and/or user’s sensors and/or application modules

ML Service

ApMon

YourApplication

Your custom Java client

GUI client

ML Repository

Your monmodule

Yourcustom view

AppMonC

bashYour application

Your appmodule

LUS

Page 15: Online Monitoring with MonALISA

Alert-based ActionsAlert-based Actions

MySQL daemon is automatically restartedwhen it runs out of memoryTrigger: threshold on VSZ memory usage ALICE Production jobs queue is automatically

kept full by the automatic resubmissionTrigger: threshold on the number of aliprod waiting jobs

Administrators are kept up-to-date on the services’ statusTrigger: presence/absence of monitored information via instant messaging, RSS feeds, toolbar alerts etc.

Page 16: Online Monitoring with MonALISA

SummarySummary

MonALISA is a very promising tool for online experiment monitoring and interfacing with a variety of slow control subsystems; GlueX are seriously considering ML for this task

Easy to configure, understand and use Experience from Grid monitoring and more Support from the developers group for

implementation of new modules/features Online experiment monitoring tests of

CLAS@Jlab were recently carried on; demo repository is at http://mlr1.gla.ac.uk:7002

Page 17: Online Monitoring with MonALISA

More examples / ExtrasMore examples / Extras

Page 18: Online Monitoring with MonALISA

Integrated Pie ChartsIntegrated Pie Charts

Page 19: Online Monitoring with MonALISA

History Plots, Annotations

History Plots, Annotations

Page 20: Online Monitoring with MonALISA

AliEn Services MonitoringAliEn Services Monitoring AliEn services

Periodically checked PID check + SOAP call Simple functional

tests SE space usage Efficiency

Page 21: Online Monitoring with MonALISA

Job Network Traffic MonitoringJob Network Traffic Monitoring Based on the xrootd

transfer from every job Aggregated statistics for

Sites (incoming, outgoing, site to site, internal)

Storage Elements (incoming, outgoing)

Of Read and written files Transferred MB/s

Page 22: Online Monitoring with MonALISA

Individual Job TrackingIndividual Job Tracking

Based on AliEn shell cmds. top, ps, spy, jobinfo, masterjob

Using the GUI ML Client Status, resource usage, per

job

Page 23: Online Monitoring with MonALISA

Head Node MonitoringHead Node Monitoring Machine parameters, real-time & history, load, memory & swap usage,

processes, sockets

Page 24: Online Monitoring with MonALISA

MonALISA in AliEnMonALISA in AliEn

The MonALISA framework is used as a primary monitoring tool for the ALICE Grid since 2004

Presently the system is used for monitoring of all (identified) services, jobs and network parameters necessary for the Grid operation and debugging

The number of concurrently monitored and stored parameters today is ~ 300.000 in 75 ML Services

The add-on tools for automatic events notification allow for more efficient reaction to problems

The framework design and flexibility answers all requirements for a monitoring system

The accumulated information allows to construct and implement automated decision making algorithms, thus increasing further the efficiency of the Grid operations


Recommended