Date post: | 17-Dec-2015 |
Category: |
Documents |
Upload: | terence-harrington |
View: | 214 times |
Download: | 0 times |
Polish Infrastructurefor Supporting Computational Science
in the European Research Space
EUROPEAN UNION
Services and Operationsin Polish NGI
M. RadeckiM. Radecki, T. Szymocha,, T. Szymocha,T. Szepieniec, M. PawlikT. Szepieniec, M. Pawlik
ACC Cyfronet AGHACC Cyfronet AGH
Cracow Grid Workshop9 Nov 2011
2
OutlineOutline
Users and Resources
Services for Users
Guarantees - Service Level Agreement
Supporting SLAs in Operations
3
Users & ResourcesUsers & Resources
Polish researchers PhD owners – base user group,
must confirm affiliation with Polish research institution
MSc & PhD students have to find their supervisor who
confirms collaboration on research subject
International collaborations need a collaboration with Polish
researchers (similar to supervisor) international VOs here
Computing: ~23k cores
Storage: ~2PB
Each site has autonomy in
resource allocation to users
All site resources accessible for all middlewares
gLite
UNICORE
QosCosGrid
Local batch system
Numbers (as of 1st Nov):
782 users, 4.4 Mhours of computing in October47% LHC VOs, 53% PL-Grid users (in October 2011 only)
4
Model of Delivering ServicesModel of Delivering Services
Service LevelAgreement
Service LevelAgreementAccess ServiceAccess Service Metrics
Access
Use C
onditions
+
Guaran
tees
Access Service
Resources
5
„„Access Services” – access to resourcesAccess Services” – access to resources
Global access gLite UNICORE QosCosGrid
Local access Batch system MySQL GPGPU vSMP User Interface machine – at each site, clients for all middlewares
6
Stages of service use – user viewStages of service use – user view
1. Become PL-Grid user
1. Get credentials - easy, on-line
2. Request access
1. User applies for activating access to the service
2. Application is answered by the service administrator
3. Service admin can manage the access afterwards
3. Establish SLA
4. Use
1. User should be able to observe service status and their current usage
2. Overuse is blocked (fair play)
5. Account
1. User accounts for consumed resources
7
Lesson learnt: pass auth* info to serviceLesson learnt: pass auth* info to service
Pass access info from central point to the service, securely, reliably what if the central database does not work what if it desynchronizes
8
SLAs for users = computing grantSLAs for users = computing grant
Why? – to build relation between Provider and User Provider need to know users expectation – necessary for resource use
planning User need to share their plan to get any guarantees on what they want – more
guarantees more user satisfaction What is needed?
Grant submission
• Researcher puts objectives, expected results, resources, additional services
• Review - grant evaluation a commitee who gives recommendation support decision at sites
Grant negotiation Resource Allocation – setup of resources according to SLA Resource Use Monitoring
• observe use, make sure agreed tresholds are met, block if overuse Accounting
9
Grants as implemented in PL-GridGrants as implemented in PL-Grid
Two types of grants depending on size of requested resources Personal grant – testing, trying – 6 months, 1000h walltime, 40GB Regular grant – intensive computing here
Tools: Portal + Bazaar Grant belongs to User Team
self-organizing all members can use grant
Resource Allocation – Bazaar Site Admin Toolkit
takes grant details from Bazaar and generates site scheduler config, run daily
site administrator retain full control over their services
Accounting – User Portal each 6 months a short report final report after grant finishes
10
Grant MonitoringGrant Monitoring
Computing Resource Use monitoring system resources consumed within given grant notion of middlewares – match jobs as being executed within specific mware
Matching Job and Grant ID – user must declare during job submission – e.g in JDL „default” grant declared in Portal – all jobs account on this if not specified else
Support for grants in gLite, UNICORE, QosCosGrid UNICORE – under development (XSEDE requirement) gLite – requirement submitted: EGI RT #2983
• temporal workaround:use VO_TAGS
QCG – under development Show consumed resources
per grant in User Portal walltime, number of jobs,
site – daily stats.
11
Important „elements” of IT Service ManagementImportant „elements” of IT Service Management
Policies – general guidelines and objectives defined by PL-Grid Strategy Team
Processes – sets of interrelated activities that converts input into output
Procedures – specified ways to perform activities
Plans People, teams Tools Other resources: budget, technologies
© T.Schaaf „Tutorial: Towards better managed Grids. IT Service Management best practices based on ITIL”
12
Processes in PL-Grid OperationsProcesses in PL-Grid Operations
User and Group Management User/Group Registration, de-
registration Access Management
granting access to service blocking access to service
Service Level Management Handling incoming SLA app. SLA Monitoring SLA Accounting
Service Availability Management Service Availability Monitoring Technical Support for
Administrators Knowledge Base Maintenance
User Support
Handling tickets
Internal knowledge DB maintenance
User FAQ maintenance
New application deployment
User documentation maint.
Configuration Management
Sites
Services
Downtimes
Change Management
Known type change mgmt - follow procedure
New type change mgmt – evaluation, discussion, decision, write new procedure
13
SummarySummary
PL-Grid Infrastructure has rich service offer to users
Users can get some guarantees together with access to resources
Idea of „computing grants” is essential for making step forward in PL-Grid service level
Well organized Operations is a key factor for successful SLA adoption