Post on 02-Jan-2016
transcript
Iosif Legrand November 20081
Iosif Legrand, Harvey Newman, Iosif Legrand, Harvey Newman, Ramiro Voicu , Costin Grigoras, Ramiro Voicu , Costin Grigoras, Catalin Cirstoiu, Ciprian DobreCatalin Cirstoiu, Ciprian Dobre
An Agent Based, Dynamic Service System to Monitor,An Agent Based, Dynamic Service System to Monitor, Control and Optimize Distributed SystemsControl and Optimize Distributed Systems
ACAT - November 2008 ERICE
Iosif Legrand November 2008 22
The MonALISA FrameworkThe MonALISA Framework
MonALISA is a Dynamic, Distributed Service System capable to collect any type of information from different systems, to analyze it in near real time and to provide support for automated control decisions and global optimization of workflows in complex grid systems.
The MonALISA system is designed as an ensemble of autonomous multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a wide range of monitoring tasks. These agents can analyze and process the information, in a distributed way, and to provide optimization decisions in large scale distributed applications.
Iosif Legrand November 2008
Distributed Object Systems Distributed Object Systems CORBA , DCOM CORBA , DCOM
LookupServiceStubLookup
Service Skeleton CLIENTServer
“Traditional” Distributed Object Models(CORBA, DCOM)
“IDL” Compiler
The Stub is linked to the Client.The Client must know about theservice from the beginning and needs the right stub for it
The Server and the client code must be created together !!
Iosif Legrand November 2008
Distributed Object Systems Distributed Object Systems Web Services WSDL/SOAP Web Services WSDL/SOAP
LookupService
WSDL
CLIENTServer
LookupService
Interface
SOAP
The client can dynamically generate the data structures and the interfaces for using remote objects based on WSDL
Platform independent
Iosif Legrand November 2008 5
Mobile Code and Distributed ServicesMobile Code and Distributed Services
Act as a true dynamic service and provide the necessary functionally to be used by any other services that require such information (Jini, interface to WSDL / SOAP) mechanism to dynamically discover all the “Service Units" remote event notification for changes in the any system lease mechanism for each registered unit
Dynamic Code Loading
LookupServiceProxy CLIENT
LookupService
Proxy
Service
Services can be used dynamicallyRemote Services Proxy == RMI StubMobile Agents Proxy == Entire Service “Smart Proxies” Proxy adjusts to the client
Any well suited protocol for the application
Iosif Legrand November 2008 6
MonALISA Service & Data HandlingMonALISA Service & Data Handling
6
Data Store
Data CacheService & DB
Configuration Control (SSL)
Predicates & Agents
Data (via ML Proxy)
Applications Clients or Higher Level
Services
WS Clients andservice
WebService
WSDLSOAP
LookupService
LookupService
Registration
Discovery
Postgres
AGENTSAGENTS
FILTERS / TRIGGERSFILTERS / TRIGGERS
Monitoring ModulesMonitoring ModulesCollects any type of information Dynamic Loading
Push and Pull
Iosif Legrand November 2008 7
The MonALISA ArchitectureThe MonALISA Architecture
7
Regional or Global High Level Regional or Global High Level Services, Services, Repositories & ClientsRepositories & Clients
Secure and reliable communicationSecure and reliable communicationDynamic load balancing Dynamic load balancing Scalability & ReplicationScalability & ReplicationAAA for ClientsAAA for Clients
Distributed Dynamic Distributed Dynamic Registration and Discovery-Registration and Discovery-based on a lease based on a lease mechanism and remote eventsmechanism and remote events
JINI-Lookup Services Secure & Public
MonALISA services
Proxies
HL services
Agents
Network of
Distributed System for gathering and Distributed System for gathering and analyzing information based on analyzing information based on mobile agents: mobile agents: Customized aggregation, Triggers,Customized aggregation, Triggers,ActionsActions
Fully Distributed System with no Single Point of Failure
Iosif Legrand November 2008 8
LookupService
Registration / Discovery Registration / Discovery Admin Access and AAA for ClientsAdmin Access and AAA for Clients
MonALISAService
LookupService
Client(other service)
DiscoveryRegistration
(signed certificate)
MonALISAService
MonALISAService
Services Proxy
Multiplexer
Services Proxy
Multiplexer
Client(other service)
Admin SSL connection
Trustkeystore
AAA services
Client authentication
Data Data Filters & AgentsFilters & Agents
Trustkeystore
Application
Applications
Iosif Legrand November 2008 9
Monitoring Grid sites, Running Jobs, Monitoring Grid sites, Running Jobs, Network Traffic, and ConnectivityNetwork Traffic, and Connectivity
9
TOPOLOGY
JOBS
ACCOUNTING
Running Jobs
Iosif Legrand November 2008 10
Monitoring architecture in ALICEMonitoring architecture in ALICE
10
Long HistoryDB
LCG Tools
MonALISA @Site
ApMon
AliEn Job Agent
ApMon
AliEn Job Agent
ApMon
AliEn Job Agent
MonALISA @CERN
MonALISA
LCG Site
ApMon
AliEn CE
ApMon
AliEn SE
ApMon
ClusterMonitor
ApMon
AliEn TQ
ApMon
AliEn Job Agent
ApMon
AliEn Job Agent
ApMon
AliEn Job Agent
ApMon
AliEn CE
ApMon
AliEn SE
ApMon
ClusterMonitor
ApMon
AliEn IS
ApMon
AliEn Optimizers
ApMon
AliEn Brokers
ApMon
MySQLServers
ApMon
CastorGridScripts
ApMon
APIServices
MonaLisaMonaLisaRepositoryRepository
Aggregated Data
rss
vsz
cputime
run
tim
e
job
slots
free
spac
e
nr.
of
file
s
op
en
files
Queued
JobAgents
cpu
ksi2k
jobstatus
disk
used
pro
cesses
loadn
etIn
/ou
t
jobsstatussockets
migratedmbytes
active
sessions
MyP
roxy
status
Alerts
Actions
Iosif Legrand November 2008 11
http://pcalimonitor.cern.ch
ALICE : Global Views, Status & JobsALICE : Global Views, Status & Jobs
Iosif Legrand November 2008 13
ALICE: Resource Usage monitoringALICE: Resource Usage monitoring
Cumulative parameters CPU Time & CPU KSI2K Wall time & Wall KSI2K Read & written files Input & output traffic (xrootd)
Running parameters Resident memory Virtual memory
Open files Workdir size Disk usage CPU usage
Aggregated per site
Iosif Legrand November 2008 14
ALICE: Job agents monitoringALICE: Job agents monitoring
From Job Agent itself Requesting job Installing packages Running job Done Error statuses
From Computing Element Available job slots Queued Job Agents Running Job Agents
Iosif Legrand November 2008 15
Monitoring the Execution of JobsMonitoring the Execution of Jobs and the Time Evolution and the Time Evolution
15
SPLIT JOBSSPLIT JOBS
LIFELINES for JOBS
Job Job
Job1
Job2
Job3
Job31
Job32
Summit a Job
DAG
Iosif Legrand November 2008 16
Two levels of decisions:
local (autonomous),
global (correlations).
Actions triggered by:
values above/below given thresholds,
absence/presence of values,
correlations between any values.
Action types:
alerts (emails/instant msg/atom feeds),
running an external command,
automatic charts annotations in the repository,
running custom code, like securely ordering a ML service to (re)start a site service.
ML ServiceML Service
ML ServiceML Service
Actions based onActions based onglobal informationglobal information
Actions based onActions based onlocal informationlocal information
• Traffic• Jobs• Hosts• Apps
• Temperature• Humidity• A/C Power• …
SensorsSensors Local Local decisionsdecisions
Global Global decisionsdecisions
Local and Global Decision FrameworkLocal and Global Decision Framework
Global ML
Services
Iosif Legrand November 2008 17
ALICE: Automatic job submissionALICE: Automatic job submissionRestarting ServicesRestarting Services
17
MySQL daemon is automatically restartedwhen it runs out of memoryTrigger: threshold on VSZ memory usage
ALICE Production jobs queue is kept full by the automatic submissionTrigger: threshold on the number of aliprod waiting jobs
Administrators are kept up-to-date on the services’ statusTrigger: presence/absence of monitored information
Iosif Legrand November 2008 18
ALICE is using the monitoring information to automatically:
resubmit error jobs until a target completion percentage is reached,
submit new jobs when necessary (watching the task queue size for each service account)
production jobs,
RAW data reconstruction jobs, for each pass,
restart site services, whenever tests of VoBox services fail but the central services are OK,
send email notifications / add chart annotations when a problem was not solved by a restart,
dynamically modify the DNS aliases of central services for an efficient load-balancing.
Most of the actions are defined by few lines configuration files.
Automatic actions in ALICEAutomatic actions in ALICE
Iosif Legrand November 2008 19
Monitoring USLHCnetMonitoring USLHCnet
Operations & management assisted by agent-based softwareOperations & management assisted by agent-based software Used on the new CIENA equipment used for network managmentUsed on the new CIENA equipment used for network managment
Iosif Legrand November 2008 20
USLHCnet: USLHCnet: Precise measurements Precise measurements for the Operational Status on the WAN Linkfor the Operational Status on the WAN Link
Operations & management assisted by agent-based softwareOperations & management assisted by agent-based software Used on the new CIENA equipment used for network managmentUsed on the new CIENA equipment used for network managment
Iosif Legrand November 2008 21
USLHCnet: Traffic on different segmentsUSLHCnet: Traffic on different segments
Iosif Legrand November 2008 22
USLHCnet: Accounting for Integrated TrafficUSLHCnet: Accounting for Integrated Traffic
Iosif Legrand November 2008 24
Available Bandwidth MeasurementsAvailable Bandwidth Measurements
Embedded Pathload module.Embedded Pathload module.
24
Iosif Legrand November 2008 25
Monitoring Network Topology, Monitoring Network Topology, Latency, RoutersLatency, Routers
NETWORKS
AS
ROUTERS
Real Time Topology Discovery & DisplayReal Time Topology Discovery & Display
Iosif Legrand November 2008 26
EVO : Real-Time monitoring for ReflectorsEVO : Real-Time monitoring for Reflectorsand the quality of all possible connectionsand the quality of all possible connections
Iosif Legrand November 2008 27
EVO: Creating a Dynamic, Global, Minimum EVO: Creating a Dynamic, Global, Minimum Spanning Tree to optimize the connectivitySpanning Tree to optimize the connectivity
Tuv
uvwTw),(
)),(()(
A weighted connected graph G = (V,E) with n vertices and m edges. The quality of connectivity between any two reflectors is measured every second.Building in near real time a minimum- spanning tree with addition constrains
Iosif Legrand November 2008 28
Dynamic MST to optimize the Dynamic MST to optimize the Connectivity for ReflectorsConnectivity for Reflectors
Frequent measurements of RTT, jitter, traffic and lost packages The MST is recreated in ~ 1 S case on communication problems.
Iosif Legrand November 2008 29
EVO: Optimize how clients connect to the EVO: Optimize how clients connect to the system for best performance and load balancingsystem for best performance and load balancing
Iosif Legrand November 2008 3030
FDT – Fast Data TransferFDT – Fast Data Transfer
FDT is an application for efficient data transfers.
Easy to use. Written in java and runs on all major platforms.
It is based on an asynchronous, multithreaded system which is using the NIO library and is able to:
stream continuously a list of files
use independent threads to read and write on each physical device
transfer data in parallel on multiple TCP streams, when necessary
use appropriate size of buffers for disk IO and networking
resume a file transfer session
Iosif Legrand November 2008 3131
FDT – Fast Data Transfer FDT – Fast Data Transfer
Pool of buffers Kernel Space
Pool of buffers Kernel Space
Data Transfer Sockets / Channels
Independent threads per device
Restore the files frombuffers
Control connection / authorization
Iosif Legrand November 2008 32
FDT featuresFDT features
April 2007 Iosif Legrand32
The FDT architecture allows to "plug-in" external security The FDT architecture allows to "plug-in" external security APIs and to use them for client authentication and APIs and to use them for client authentication and authorization. Supports several security schemes :authorization. Supports several security schemes :
• IP filtering IP filtering • SSH SSH • GSI-SSHGSI-SSH• Globus-GSI Globus-GSI • SSL SSL
User defined loadable modules for Pre and Post User defined loadable modules for Pre and Post Processing to provide support for dedicated MS system, Processing to provide support for dedicated MS system, compression … compression …
FDT can be monitored and controlled dynamically by FDT can be monitored and controlled dynamically by MonALISA servicesMonALISA services
Iosif Legrand November 2008 33October 2006 Iosif Legrand
33
FDT – Memory to Memory Tests in WANFDT – Memory to Memory Tests in WAN
CPUs Dual Core Intel
Xenon @ 3.00 GHz, 4 GB RAM, 4 x 320 GB SATA Disks Connected with 10Gb/s Myricom
~9.0 Gb/s
~9.4 Gb/s
Iosif Legrand November 2008 34
Disk -to- Disk transfers in WANDisk -to- Disk transfers in WAN
NEW YORK GENEVA
Reads and writes on 4 SATA disks in parallel on each server
Mean traffic ~ 210 MB/s~ 0.75 TB per hour
MB
/s
CERN CALTECH
Reads and writes on two 12-port RAID Controllers in parallel on each server
Mean traffic ~ 545 MB/s~ 2 TB per hour
1U Nodes with 4 Disks 4U Disk Servers with 24 Disks
October 2007 Iosif Legrand
Lustre read/ write ~ 320 MB/s between Florida and Caltech Works with xrootd Interface to dCache using the dcap protocol
Iosif Legrand November 2008 35
Dynamic restorationof lightpath if a segment has problems
Monitoring Optical SwitchesMonitoring Optical Switches
Iosif Legrand November 2008 36
Monitoring the Topology and Optical Monitoring the Topology and Optical Power on Fibers for Optical CircuitsPower on Fibers for Optical Circuits
Port power monitoring
Controlling
Glimmerglass Switch Example
Iosif Legrand November 2008 37
““On-Demand”, End to End Optical On-Demand”, End to End Optical Path AllocationPath Allocation
37
Internet
A
>FDT A/fileX B/path/
OS path availableConfiguring interfacesStarting Data Transfer
Mo
nito
r
Co
ntro
l
TL
1
Optical Switch
MonALISAService
MonALISA Distributed Service System
BOSAgent
Active light path
Regul
ar IP
pat
hReal time monitoring
APPLICATION
LISA AGENTLISA sets up - Network Interfaces - TCP stack - Kernel parameters - RoutesLISA APPLICATION“use eth1.2, …”
LISALISA AgentAgent
DATA
CREATES AN END TO END PATH < 1s
Detects errors and automatically recreate theDetects errors and automatically recreate the path in less than the TCP timeout path in less than the TCP timeout
Iosif Legrand November 2008 38
CERNGeneva
CALTECHPasadena
Starlight
Manlan
USLHCnet
Internet2
Controlling Optical Planes Controlling Optical Planes Automatic Path RecoveryAutomatic Path Recovery
“Fiber cut” simulationsThe traffic moves from one transatlantic line to the other oneFDT transfer (CERN – CALTECH) continues uninterruptedTCP fully recovers in ~ 20s
1
23
4
FDT Transfer
4 Fiber cuts simulations
200+ MBytes/secFrom a 1U Node
4 fiber cut emulations
Iosif Legrand November 2008 39
End to End Path Provisioning End to End Path Provisioning on different layerson different layers
Layer 3
Layer 2
Layer 1
Default IP route
VCAT and VLAN channels
Optical path
Site A
Site B
Monitor layout / Setup circuit
Monitor host & end-to-end paths / Setup end-host parameters
Control transfers and bandwidth reservations
Monitor interfaces traffic
Iosif Legrand November 2008 40
APPLICATION
>FDT A/fileX B/path/
path or channel allocationConfiguring interfacesStarting Data Transfer
Regular IP path Regular IP
pathLocal VLANs
Recommended to use two NICs -one for management /one for data -- bonding two NICs to the same IP
MAP Local VLANsto WAN channels or light paths
““On-Demand”, L2 Dynamic On-Demand”, L2 Dynamic Channel and Path AllocationChannel and Path Allocation
Iosif Legrand November 2008 41
The Need for Planning and Scheduling for The Need for Planning and Scheduling for Large Data TransfersLarge Data Transfers
In Parallel Sequential
2.5 X Faster to perform the two reading tasks sequentially
Iosif Legrand November 2008 42
UserScheduling
ControlMonitoring
End Host Agents
RealtimeFeedback
Request
Channel allocation based on VO/Priority, [ + Wait time, etc.] Create on demand a End-to-end path or Channel & configure end-hosts Automatic recovery (rerouting) in case of errors Dynamic reallocation of throughputs per channel: to manage priorities,
control time to completion, where needed Reallocate resources requested but not used
Dynamic Path Provisioning Dynamic Path Provisioning Queueing and Scheduling Queueing and Scheduling
Iosif Legrand November 2008 43
Dynamic priority for FDT TransfersDynamic priority for FDT Transferson common segmentson common segments
Priority 4
Priority 2
Priority 8
Iosif Legrand November 2008 45
FDT & MonLISA Used at SC 2006FDT & MonLISA Used at SC 2006
April 2007 Iosif Legrand
17.7 Gb/s Disk to Disk 17.7 Gb/s Disk to Disk on 10 Gb/s link used inon 10 Gb/s link used inBoth directions from Both directions from Florida to CaltechFlorida to Caltech
Iosif Legrand November 2008 46
Official BWCOfficial BWCHyper BWCHyper BWC
SC2006SC2006
April 2007 Iosif Legrand
Iosif Legrand November 2008 48
Communities using MonALISACommunities using MonALISA
48
Major Communities
ALICE CMS ATLAS EVO LGC RUSSIA UNAM Grid (Mx) ITU
USLHCNET ULTRALIGHT GLORIAD ABILENE RoEduNET Enlightened
--
VRVSVRVSALICE
USLHCnetUSLHCnet
EVOEVO
OSGOSG
MonALISA TodayRunning 24 X 7
at ~340 Sites Collecting ~ 1 000 000
parameters in near real-time
Update rate of 20,000 parameter updates per second
Monitoring12,000 computers > 100 WAN Links
Thousands of Grid jobs running concurrently
http://monalisa.caltech.edu