+ All Categories
Home > Documents > VMs at a Tier-1 site

VMs at a Tier-1 site

Date post: 15-Jan-2016
Category:
Upload: nibal
View: 35 times
Download: 0 times
Share this document with a friend
Description:
VMs at a Tier-1 site. EGEE’09, 21-09-2009 Sander Klous, Nikhef. Contents. Introduction Who are we? Motivation Why are we interested in VMs? What are we going to do with VMs? Status How do we approach this issue? Where do we stand? Challenges. Introduction. Collaboration between - PowerPoint PPT Presentation
Popular Tags:
27
VMs at a Tier-1 site EGEE’09, 21-09- 2009 Sander Klous, Nikhef
Transcript
Page 1: VMs at a Tier-1 site

VMs at a Tier-1 site

EGEE’09, 21-09-2009

Sander Klous, Nikhef

Page 2: VMs at a Tier-1 site

Contents• Introduction

– Who are we?• Motivation

– Why are we interested in VMs?– What are we going to do with VMs?

• Status– How do we approach this issue?– Where do we stand?

• Challenges

03-09-2009 BIG Grid - Virtualization working group 2

Page 3: VMs at a Tier-1 site

Introduction

• Collaboration between– NCF: national computing facilities– Nikhef: national institute for subatomic physics– NBIC: national bioinformatics center

• Participation from Philips, SARA, etc.

Goal:

“Enables access to grid infrastructures for scientific research in the Netherlands”

03-09-2009 BIG Grid - Virtualization working group 3

Page 4: VMs at a Tier-1 site

Motivation: Why Virtual Machines?• Site perspective

– Resource flexibility (e.g. SL4 / SL5)– Resource management

• Scheduling / multi-core / sandboxing

• User perspective– Isolation from environment

• Identical environment on multiple sites• Identical environment on local machine

03-09-2009 BIG Grid - Virtualization working group 4

Page 5: VMs at a Tier-1 site

Different VM classes• Class 1: Site generated Virtual Machines

– No additional trust issues– Benefits for system administration

• Class 2: Certified Virtual Machines– Inspection and certification to establish trust– Requirements for monitoring / integration

• Class 3: User generated Virtual Machines– No trust relation– Requires appropriate security measures

03-09-2009 BIG Grid - Virtualization working group 5

Page 6: VMs at a Tier-1 site

Resource management

Site infrastructure

Typical use case Class 1 VM

Torque/PBS

Box 2“8 Virtual SL4 WNs”

Box 3“8 Virtual SL5 WNs”

Virtual Machine Manager

Job queue

VMqueue

Box 1“Normal WN”

03-09-2009 BIG Grid - Virtualization working group 6

Page 7: VMs at a Tier-1 site

Typical use case Class 2 VM

Analysis on Virtual Machines• Run minimal analysis on desktop/laptop

– Access to grid services• Run full analysis on the grid

– Identical environment– Identical access to grid services

• No interest to become system administrator– Standard experiment software is sufficient

03-09-2009 BIG Grid - Virtualization working group 7

Page 8: VMs at a Tier-1 site

Typical use case Class 3 VM

Identification and classification of GPCRs• Requires very specific software set

– Blast 2.2.16– HMMER 2.3.2– BioPython1.50

• Even non-x86 (binary) applications!• Specific software for this user• No common experiment software

03-09-2009 BIG Grid - Virtualization working group 8

Page 9: VMs at a Tier-1 site

Project status• Working group: virtualization of worker nodes

https://wiki.nbic.nl/index.php/BigGrid_virtualisatie• Kick-off meeting July 6th 2009

– System administrators, User support, management• Phase 1 (3 months)

– Collect site and user requirements– Identify other ongoing efforts in Europe– First design

• Phase 2 (3 months)– Design and implement proof of concept

03-09-2009 BIG Grid - Virtualization working group 9

Page 10: VMs at a Tier-1 site

Active working group topics

• Policies/Security issues for Class 2/3 VMs• Technology study

– Managing Virtual Machines– Distributing VM images– Interfacing the VM infrastructure with ‘the grid’

• Identify missing functionality and alternatives– Accounting and fare share, image management,

authentication/authorization, etc.

03-09-2009 BIG Grid - Virtualization working group 10

Page 11: VMs at a Tier-1 site

The Amazon identity crisis

• The three most confronting questions:1. What is the difference between a job and a VM?

2. Why can I do it at Amazon, but not at the grid?

3. What is the added value of grids over clouds?

“We don’t want to compete with Amazon!”

03-09-2009 BIG Grid - Virtualization working group 11

Page 12: VMs at a Tier-1 site

Policy and security issues

E-science services and functionality• Data integrity, confidentiality and privacy• Non-repudiation of user actions

System administrator point of view• Trust user intentions, not their implementations• Incident response more costly than certification• Forensics is time consuming

03-09-2009 BIG Grid - Virtualization working group 12

Page 13: VMs at a Tier-1 site

Compromised user space is often already enough trouble

Security 101 = Attack surface

03-09-2009 BIG Grid - Virtualization working group 13

Page 14: VMs at a Tier-1 site

Available policies

• Grid Security Policy, version 5.7a• VO Portal Policy, version 1.0 (draft)• Big Grid Security Policy, version 2009-025

– Grid Acceptable Use Policy, version 3.1– Grid Site Operations Policy, version 1.4a– LCG/EGEE Incident Handling and Response Guide,

version 2.1– Grid Security Traceability and Logging Policy,

version 2.0• VO-Box Security Recommendations and

Questionnaire, version 0.6 (draft, not ratified)

03-09-2009 BIG Grid - Virtualization working group 14

Page 15: VMs at a Tier-1 site

Relevant policy statements

• Network security is covered by site local security policies and practices

• A VO Box is part of the trusted network fabric. Privileged access is limited to resource administrators

• Software deployed in the grid must include sufficient and relevant site central logging.

03-09-2009 BIG Grid - Virtualization working group 15

Page 16: VMs at a Tier-1 site

First compromise• Certified package repository

– Base templates– Certified packages

• Separate user disk– User specific stuff– Permanent storage

• At run time– No privileged access– Comparable to VO box

03-09-2009 BIG Grid - Virtualization working group 16

Licenses?

Page 17: VMs at a Tier-1 site

Second compromise• Make separate grid DMZ for Class 3 VMs• Comparable to “Guest networks”

– Only outbound connectivity• Detection of compromised guests

– Extended security monitoring• Packet inspection, netflows (SNORT, nfsen)• Honeypots, etc.

• Simple policy: one warning, you’re out.• Needs approval (network policy) from

OST (Operations Steering Team)03-09-2009 BIG Grid - Virtualization working group 17

Page 18: VMs at a Tier-1 site

TECHNOLOGY STUDY

03-09-2009 BIG Grid - Virtualization working group

18

Page 19: VMs at a Tier-1 site

Resource management

Site

Managing VMs

Torque/PBS

Box 2“8 Virtual WNs”

Box 3“8 Class 2/3 VMs”

OpenNebula

Job queue

VMqueue

Box 1“Normal WN”

Haizea

03-09-2009 BIG Grid - Virtualization working group 19

Page 20: VMs at a Tier-1 site

Class 2/3

upload solution

iSCSI/LVM

Distributing VM images

Box 3“8 Class 2/3 VMs”

Box 1“Normal WN”

Box 2“8 Virtual WNs”

Repository (SAN)

ImageImageImageImageImage

03-09-2009 BIG Grid - Virtualization working group 20

Page 21: VMs at a Tier-1 site

Cached copy-on-write

03-09-2009 BIG Grid - Virtualization working group 21

Box 1

Repository

Cache

ImageCOW

COWVM

VM

Box 2

Cache

Image

COW

COW

VM

VM

Image

Page 22: VMs at a Tier-1 site

Interfacing VMs with ‘the grid’

Resource management

Torque/PBS OpenNebula

Class 2/3

upload solution

Repository (SAN)

ImageImageImageImageImage

Class 2 Class 3

discussion

Grid middleware• globus-job-run• globus-gatekeeper• globus-job-

manager• contact-string

• jm-pbs-long• jm-opennebula

• qsub / opennebula

Nim

bus/OC

CI

03-09-2009 BIG Grid - Virtualization working group 22

Page 23: VMs at a Tier-1 site

VM contact-string• User management mapping

– Mapping to OpenNebula users• Authentication / Authorization

– Access to different VM images• Grid middleware components involved:

– Cream-CE, BLAHp, glexec– Execution Environment Service

https://edms.cern.ch/document/1018216/1

– Authorization Service Designhttps://edms.cern.ch/document/944192/1

03-09-2009 BIG Grid - Virtualization working group 23

Coffee table discussion

Parameter passing issue

Page 24: VMs at a Tier-1 site

Monitoring/Performance testing

Show Hosts : yes no | Experimental network_report last hour sorted descending | Columns 4 Size medium

(Nodes colored by 1-minute load) | Legend

Ganglia Web Frontend version 3.1.1 Check for Updates.

tropeRretsulClatnemirepxE::ailgnaG9002-9-3

http://ploeg.nikhef.nl/ganglia/?m=network_report&r=ho… 2/3

03-09-2009 BIG Grid - Virtualization working group 24

Page 25: VMs at a Tier-1 site

Performance• Small cluster

– 4 dual CPU quad core machines– Image server with 2 TB storage

• Integration with experimental testbed– Existing Cream-CE / Torque

• Testing– Network I/O, is NAT feasible?– File I/O, what is the COW overhead?– Realistic jobs

03-09-2009 BIG Grid - Virtualization working group 25

Page 26: VMs at a Tier-1 site

Other challenges• Accounting, scheduling based on Fair Share • Scalability!• Rapidly changing landscape

– New projects every week– New versions every month

• So many alternatives– VMWare, SGE, Eucalyptus, Enomaly– iSCSI, NFS, GFS, Hadoop– Monitoring and security tools

03-09-2009 BIG Grid - Virtualization working group 26

Page 27: VMs at a Tier-1 site

Conclusions

• Maintainability: no home grown scripting– Each solution should be part of a product– Validation procedure with each upgrade

• Deployment– Gradually move VM functionality in production

1. Introduce VM worker nodes

2. Virtual machine endpoint in grid middleware

3. Test with a few specific Class 2/3 VMs

4. Scaling and performance tuning

03-09-2009 BIG Grid - Virtualization working group 27


Recommended