+ All Categories
Home > Documents > VMs at a Tier-1 site EGEE’09, 21-09-2009 Sander Klous, Nikhef.

VMs at a Tier-1 site EGEE’09, 21-09-2009 Sander Klous, Nikhef.

Date post: 24-Dec-2015
Category:
Upload: anis-french
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
27
VMs at a Tier-1 site EGEE’09, 21-09- 2009 Sander Klous, Nikhef
Transcript

VMs at a Tier-1 site

EGEE’09, 21-09-2009

Sander Klous, Nikhef

Contents• Introduction

– Who are we?• Motivation

– Why are we interested in VMs?– What are we going to do with VMs?

• Status– How do we approach this issue?– Where do we stand?

• Challenges

03-09-2009 BIG Grid - Virtualization working group 2

Introduction

• Collaboration between– NCF: national computing facilities– Nikhef: national institute for subatomic physics– NBIC: national bioinformatics center

• Participation from Philips, SARA, etc.

Goal:

“Enables access to grid infrastructures for scientific research in the Netherlands”

03-09-2009 BIG Grid - Virtualization working group 3

Motivation: Why Virtual Machines?• Site perspective

– Resource flexibility (e.g. SL4 / SL5)– Resource management

• Scheduling / multi-core / sandboxing

• User perspective– Isolation from environment

• Identical environment on multiple sites• Identical environment on local machine

03-09-2009 BIG Grid - Virtualization working group 4

Different VM classes• Class 1: Site generated Virtual Machines

– No additional trust issues– Benefits for system administration

• Class 2: Certified Virtual Machines– Inspection and certification to establish trust– Requirements for monitoring / integration

• Class 3: User generated Virtual Machines– No trust relation– Requires appropriate security measures

03-09-2009 BIG Grid - Virtualization working group 5

Resource management

Site infrastructure

Typical use case Class 1 VM

Torque/PBS

Box 2“8 Virtual SL4 WNs”

Box 3“8 Virtual SL5 WNs”

Virtual Machine Manager

Job queue

VMqueue

Box 1“Normal WN”

03-09-2009 BIG Grid - Virtualization working group 6

Typical use case Class 2 VM

Analysis on Virtual Machines• Run minimal analysis on desktop/laptop

– Access to grid services• Run full analysis on the grid

– Identical environment– Identical access to grid services

• No interest to become system administrator– Standard experiment software is sufficient

03-09-2009 BIG Grid - Virtualization working group 7

Typical use case Class 3 VM

Identification and classification of GPCRs• Requires very specific software set

– Blast 2.2.16– HMMER 2.3.2– BioPython1.50

• Even non-x86 (binary) applications!• Specific software for this user• No common experiment software

03-09-2009 BIG Grid - Virtualization working group 8

Project status• Working group: virtualization of worker nodes

https://wiki.nbic.nl/index.php/BigGrid_virtualisatie• Kick-off meeting July 6th 2009

– System administrators, User support, management• Phase 1 (3 months)

– Collect site and user requirements– Identify other ongoing efforts in Europe– First design

• Phase 2 (3 months)– Design and implement proof of concept

03-09-2009 BIG Grid - Virtualization working group 9

Active working group topics

• Policies/Security issues for Class 2/3 VMs• Technology study

– Managing Virtual Machines– Distributing VM images– Interfacing the VM infrastructure with ‘the grid’

• Identify missing functionality and alternatives– Accounting and fare share, image management,

authentication/authorization, etc.

03-09-2009 BIG Grid - Virtualization working group 10

The Amazon identity crisis

• The three most confronting questions:1. What is the difference between a job and a VM?

2. Why can I do it at Amazon, but not at the grid?

3. What is the added value of grids over clouds?

“We don’t want to compete with Amazon!”

03-09-2009 BIG Grid - Virtualization working group 11

Policy and security issues

E-science services and functionality• Data integrity, confidentiality and privacy• Non-repudiation of user actions

System administrator point of view• Trust user intentions, not their implementations• Incident response more costly than certification• Forensics is time consuming

03-09-2009 BIG Grid - Virtualization working group 12

Compromised user space is often already enough trouble

Security 101 = Attack surface

03-09-2009 BIG Grid - Virtualization working group 13

Available policies

• Grid Security Policy, version 5.7a• VO Portal Policy, version 1.0 (draft)• Big Grid Security Policy, version 2009-025

– Grid Acceptable Use Policy, version 3.1– Grid Site Operations Policy, version 1.4a– LCG/EGEE Incident Handling and Response Guide,

version 2.1– Grid Security Traceability and Logging Policy,

version 2.0• VO-Box Security Recommendations and

Questionnaire, version 0.6 (draft, not ratified)

03-09-2009 BIG Grid - Virtualization working group 14

Relevant policy statements

• Network security is covered by site local security policies and practices

• A VO Box is part of the trusted network fabric. Privileged access is limited to resource administrators

• Software deployed in the grid must include sufficient and relevant site central logging.

03-09-2009 BIG Grid - Virtualization working group 15

First compromise• Certified package repository

– Base templates– Certified packages

• Separate user disk– User specific stuff– Permanent storage

• At run time– No privileged access– Comparable to VO box

03-09-2009 BIG Grid - Virtualization working group 16

Licenses?

Second compromise• Make separate grid DMZ for Class 3 VMs• Comparable to “Guest networks”

– Only outbound connectivity• Detection of compromised guests

– Extended security monitoring• Packet inspection, netflows (SNORT, nfsen)• Honeypots, etc.

• Simple policy: one warning, you’re out.• Needs approval (network policy) from

OST (Operations Steering Team)03-09-2009 BIG Grid - Virtualization working group 17

TECHNOLOGY STUDY

03-09-2009 BIG Grid - Virtualization working group

18

Resource management

Site

Managing VMs

Torque/PBS

Box 2“8 Virtual WNs”

Box 3“8 Class 2/3 VMs”

OpenNebula

Job queue

VMqueue

Box 1“Normal WN”

Haizea

03-09-2009 BIG Grid - Virtualization working group 19

Class 2/3

upload solution

iSCSI/LVM

Distributing VM images

Box 3“8 Class 2/3 VMs”

Box 1“Normal WN”

Box 2“8 Virtual WNs”

Repository (SAN)

ImageImageImageImageImage

03-09-2009 BIG Grid - Virtualization working group 20

Cached copy-on-write

03-09-2009 BIG Grid - Virtualization working group 21

Box 1

Repository

Cache

ImageCOW

COWVM

VM

Box 2

Cache

Image

COW

COW

VM

VM

Image

Interfacing VMs with ‘the grid’

Resource management

Torque/PBS OpenNebula

Class 2/3

upload solution

Repository (SAN)

ImageImageImageImageImage

Class 2 Class 3

discussion

Grid middleware• globus-job-run• globus-gatekeeper• globus-job-

manager• contact-string

• jm-pbs-long• jm-opennebula

• qsub / opennebula

Nim

bus/OC

CI

03-09-2009 BIG Grid - Virtualization working group 22

VM contact-string• User management mapping

– Mapping to OpenNebula users• Authentication / Authorization

– Access to different VM images• Grid middleware components involved:

– Cream-CE, BLAHp, glexec– Execution Environment Service

https://edms.cern.ch/document/1018216/1

– Authorization Service Designhttps://edms.cern.ch/document/944192/1

03-09-2009 BIG Grid - Virtualization working group 23

Coffee table discussion

Parameter passing issue

Monitoring/Performance testing

Show Hosts : yes no | Experimental network_report last hour sorted descending | Columns 4 Size medium

(Nodes colored by 1-minute load) | Legend

Ganglia Web Frontend version 3.1.1 Check for Updates.

tropeRretsulClatnemirepxE::ailgnaG9002-9-3

http://ploeg.nikhef.nl/ganglia/?m=network_report&r=ho… 2/3

03-09-2009 BIG Grid - Virtualization working group 24

Performance• Small cluster

– 4 dual CPU quad core machines– Image server with 2 TB storage

• Integration with experimental testbed– Existing Cream-CE / Torque

• Testing– Network I/O, is NAT feasible?– File I/O, what is the COW overhead?– Realistic jobs

03-09-2009 BIG Grid - Virtualization working group 25

Other challenges• Accounting, scheduling based on Fair Share • Scalability!• Rapidly changing landscape

– New projects every week– New versions every month

• So many alternatives– VMWare, SGE, Eucalyptus, Enomaly– iSCSI, NFS, GFS, Hadoop– Monitoring and security tools

03-09-2009 BIG Grid - Virtualization working group 26

Conclusions

• Maintainability: no home grown scripting– Each solution should be part of a product– Validation procedure with each upgrade

• Deployment– Gradually move VM functionality in production

1. Introduce VM worker nodes

2. Virtual machine endpoint in grid middleware

3. Test with a few specific Class 2/3 VMs

4. Scaling and performance tuning

03-09-2009 BIG Grid - Virtualization working group 27


Recommended