Post on 21-Dec-2015
transcript
Monitoring for GridNNN project
Sergey Belov, LIT JINR
15 September, NEC’2011, Varna, Bulgaria
S. Belov, GridNNN monitoring
Grid support for nationalnanotechnology networkof Russia◦ To provide for science and industry an effective access to the
distributed computational, informational and networking facilities
◦ Expecting breakthrough in nanotechnologies◦ Supported by the special federal program
Main technical points◦ based on a network of supercomputers (about 15-30)◦ has two grid operations centers (main and backup)◦ is a set of grid services with unified interface◦ partially based on Globus Toolkit 4
2/15
GridNNN project (I)
S. Belov, GridNNN monitoring
Main aim◦ integration of small and medium supercomputers into a
unified distributed computing environment Highly heterogeneous grid environment
(hardware, software) Oriented to parallel tasks rather than single batch
tasks Workflow management
◦ Jobs consist of tasks Follows core OGSA principles GSI based security model RESTful grid services
3/15
GridNNN project (II)
S. Belov, GridNNN monitoring 4/15
GridNNN architecture layers
Based on the report of A.Kryukov et al., Architecture of GridNNN, GRID’2010
S. Belov, GridNNN monitoring
WebUI server Resource Brocker/metascheduler +
Workflow management (RESTful) Information Service (RESTful / WS MDS) Monitoring & Accounting Registration service (RESTful) GSI services
◦ CA, MyProxy, VOMS GridFTP servers
5/15
Core grid services
Based on the report of A.Kryukov et al., Architecture of GridNNN, GRID’2010
S. Belov, GridNNN monitoring
State of sites and services◦ Availability◦ Real operational state
Monitoring of user's jobs and tasks Keeping history on different system's
parameters Information representation
◦ General infrastructure state in whole◦ Running jobs and tasks◦ Separate sites and services (real-time and history)◦ Visualization of job events
6/15
Monitoring goals
S. Belov, GridNNN monitoring
State of computational resources by site (based on data from information index(es))
Slots available for tasks Jobs (total on site), jobs belong to GridNNN Structure and properties of clusters
◦ Subclusters, nodes, slots, operation system, architecture
◦ Application software◦ Supported VOs (with ACLs, Access Control Lists)
Monitoring of jobs running on sites (by information from Pilot servers)
7/15
Monitoring of resources
S. Belov, GridNNN monitoring
Goal: checks of services' operation Simple tests for services registered in Service for
Registration of Resources and Services Connection to the declared port of the machine
(plane or secured — in depend of specified protocol)
Information requests to some services Separate tests scenarios for MDS information
indexes and Service for Registration of Resources and Services: information
Web page with the history of functional tests results
8/15
Simple functional tests of services
S. Belov, GridNNN monitoring
Goal: to get information, both real-time and historical, on resources utilization and jobs running on GridNNN infrastructure (by users, VOs, sites)
Information sources: Pilot servers, GRAMs and local resources managers
Collecting data on jobs and tasks in the system◦ All jobs events timestamps, real consumed CPU time
Accounting information reports in different views:◦ by sites, VOs and single users
Aggregation of actual job's execution time from all sites
9/15
Accounting and job monitoring
S. Belov, GridNNN monitoring
Gathering statistics on CPUtime consumed by usersand VOs◦ In plain hours, later with allowance
of computational system productivity Displaying the statistics of CPU resources
usage◦ Different report kinds: for user, VO manager, site
admin, GridNNN project admins◦ Statistics access roles to protect private
information of users and VOs
10/15
GridNNN accounting
S. Belov, GridNNN monitoring 11/15
Accounting and jobs monitoring: screenshots
And
rey
Dem
iche
v
Eyg
ene
Rya
bink
in
And
rey
Kiry
anov
Ale
xey
Tar
asov
Tar
as S
hapo
valo
v
Lev
Sha
mar
din
Mik
alai
Kut
ousk
i
Ilya
Gor
buno
v
Ale
xand
r P
ivus
hkov
Eic
Dus
hano
v
Ale
xey
Shm
elki
n
Nik
olay
Prik
hodk
o
Gre
gory
Shp
iz
Nat
alia
Chi
rska
ya
Ser
gey
Mal
kovs
ky
оста
льны
е
0
10000
20000
30000
40000
50000
60000
Запуск заданий пользователями
Всего заданий: 106990Пользователей с сертификатами: 44, активных: 33
Завершено успешно
Завершено с ошибкой
S. Belov, GridNNN monitoring 12/15
Monitoring and accounting information flows
Monitoring andaccounting
datastorage
Informationcollector
PilotJob
managementservices
Monitoringwebsite
Monitoring dataprovisioning
(Web Services)
AccountingInformationpublisher
Functional testsof the services
Infosyscentral
Informationindex
S. Belov, GridNNN monitoring
More than 15 resource centers at the moment in different regions of Russia◦ RRC KI, «Chebyshev» (MSU), IPCP RAS, CC FEB RAS,
ICMM RAS, JINR, SINP MSU, PNPI, KNC RAS, SPbSU, SPII RAS and others
13/15
GridNNN centers on the map
http://mon.ngrid.ru
S. Belov, GridNNN monitoring 14/15
Infrastructure operation visualization with Google Earth
S. Belov, GridNNN monitoring
GridNNN project was successfully finished this summer
The resulting software and created infrastructure are to be used for developing Russian Grid Network project
Fully operational monitoring and accounting tools are in production
Further user interfaces improvements are planned within Russian Grid Network project
15/15
Conclusion