M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 11
LNL
CMS
Farm monitoringFarm monitoring
Massimo Biasotto - LNLMassimo Biasotto - LNL
M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 22
LNL
CMS
Local Farm MonitoringLocal Farm Monitoring
LNL experiences with “local” farm monitoringLNL experiences with “local” farm monitoring
July 2001, we first started with July 2001, we first started with MRTGMRTG: a lot of problems: a lot of problems– heavy footprint on the serverheavy footprint on the server– unreliable (processes hanging with unreachable hosts)unreliable (processes hanging with unreachable hosts)– not scalablenot scalable
November 2001, November 2001, remstatsremstats: improvements: improvements– lighter and more robust than MRTGlighter and more robust than MRTG– more flexibility in graph display and alarm managementmore flexibility in graph display and alarm management– still scalability problems (it works in sequential mode)still scalability problems (it works in sequential mode)
M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 33
LNL
CMS
Remstats exampleRemstats example
M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 44
LNL
CMS
Remstats exampleRemstats example
M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 55
LNL
CMS
Remstats vs MRTGRemstats vs MRTG
M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 66
LNL
CMS
GangliaGanglia
March 2001, ganglia: many advantagesMarch 2001, ganglia: many advantages– much greater resolution: metrics sampled every 15 sec much greater resolution: metrics sampled every 15 sec
instead of 5 mininstead of 5 min– scalability: based on a distributed architecture, with scalability: based on a distributed architecture, with
data exchange via multicast channeldata exchange via multicast channel– single host metrics easily integrated to produce single host metrics easily integrated to produce
“cumulative” overview graphs“cumulative” overview graphs– there is still need to customize the tool (adding more there is still need to customize the tool (adding more
metrics, customizing web pages, etc)metrics, customizing web pages, etc)
M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 77
LNL
CMS
M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 88
LNL
CMS
Ganglia exampleGanglia example
M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 99
LNL
CMS
Ganglia exampleGanglia example
M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 1010
LNL
CMS
Ganglia exampleGanglia example
M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 1111
LNL
CMS
NetsaintNetsaint
During our survey of the existing monitoring tools, Netsaint During our survey of the existing monitoring tools, Netsaint was considered and discardedwas considered and discarded– Main reason: it didn’t monitor host performance metrics, Main reason: it didn’t monitor host performance metrics,
like % cpu, load, network traffic, etc. (at least, not like % cpu, load, network traffic, etc. (at least, not without without heavyheavy customization). Maybe now the necessary customization). Maybe now the necessary plugins have been added.plugins have been added.
– It didn’t have a database to record the historical dataIt didn’t have a database to record the historical data It monitors the status of the hosts (up or down) and of It monitors the status of the hosts (up or down) and of
some network servicessome network services It provides a log of all relevant events (hosts/services going It provides a log of all relevant events (hosts/services going
up or down, etc.)up or down, etc.) Probably other features, but I’ve never investigated the tool Probably other features, but I’ve never investigated the tool
deeplydeeply
M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 1212
LNL
CMS
Grid monitoringGrid monitoring
Grid monitoring is different than “local” farm monitoringGrid monitoring is different than “local” farm monitoring– you cannot monitor on a WAN all the performance you cannot monitor on a WAN all the performance
metrics of all the farm nodes (and you probably don’t metrics of all the farm nodes (and you probably don’t want to)want to)
Currently, Netsaint is used on DataGrid Testbed to monitor Currently, Netsaint is used on DataGrid Testbed to monitor the status of the testbed nodes and their grid servicesthe status of the testbed nodes and their grid services
http://infngrid.ct.infn.it/index-orig.htmlhttp://infngrid.ct.infn.it/index-orig.html (infn-tb/guest) (infn-tb/guest) Is this useful for CMS?Is this useful for CMS? Can other useful features be added?Can other useful features be added?
M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 1313
LNL
CMS
Netsaint exampleNetsaint example
M.Biasotto, Padova, 24 aprile 2002M.Biasotto, Padova, 24 aprile 2002 1414
LNL
CMS
Adapting CMS monitoring to GridAdapting CMS monitoring to Grid
What are the CMS requirements for “Grid monitoring”?What are the CMS requirements for “Grid monitoring”? What do we want to monitor and why?What do we want to monitor and why? Once these questions have been addressed, we can decide Once these questions have been addressed, we can decide
if Netsaint fulfills the requirementsif Netsaint fulfills the requirements Integrating Netsaint into existing CMS farms shouldn’t be Integrating Netsaint into existing CMS farms shouldn’t be
difficult difficult – the main issue is probably the setup (and maintenance) the main issue is probably the setup (and maintenance)
of the central repositoryof the central repository But it should be done only if there is a real need, not just But it should be done only if there is a real need, not just
for the sake of itfor the sake of it