Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | nathan-higgins |
View: | 220 times |
Download: | 0 times |
11
Indranil Gupta (Indy)Lecture 4
Cloud Computing: Older TestbedsJanuary 28, 2010
CS 525 Advanced Distributed
SystemsSpring 2010
All Slides © IG
2
Administrative Announcements
Office Hours Changed from Today onwards:
• Tuesdays 2-3 pm (same as before)• Thursdays 3-4 pm (new)
• My office 3112 SC
3
Administrative Announcements
Student-led paper presentations (see instructions on website)• Start from February 11th• Groups of up to 2 students present each class, responsible
for a set of 3 “Main Papers” on a topic– 45 minute presentations (total) followed by discussion– Set up appointment with me to show slides by 5 pm day prior to
presentation– Select your topic by Jan 31st
• List of papers is up on the website• Each of the other students (non-presenters) expected to read
the papers before class and turn in a one to two page review of the any two of the main set of papers (summary, comments, criticisms and possible future directions)– Email review and bring in hardcopy before class
4
Announcements (contd.)
Projects• Groups of 2 (need not be same as
presentation groups)• We’ll start detailed discussions “soon” (a few
classes into the student-led presentations)
5
1940
1950
1960
1970
1980
1990
2000
Timesharing Companies & Data Processing Industry
2010
Grids
Peer to peer systems
Clusters
The first datacenters!
PCs(not distributed!)
Clouds and datacenters
“A Cloudy History of Time” © IG 2010
6
• Can there be a course devote to purely cloud computing that touches on only results within the last 5 years? – No!
• Since cloud computing is not completely new, where do we start learning about its basics?– From the beginning: distributed
algorithms, peer to peer systems, sensor networks
More Discussion Points
7
That’s what we do in CS525
• Basics of Peer to Peer Systems– Read papers on Gnutella and Chord
• Basics of Sensor Networks– See links
• Basics of Distributed Algorithms
Yeah! Let’s go to the basics.
8
Hmm, CCT and OpenCirrus are new. What about classical
testbeds?
• A community resource open to researchers in academia and industry• http://www.planet-lab.org/• Currently, 1077 nodes at 494 sites across the world• Founded at Princeton University (led by Prof. Larry Peterson), but owned in a federated manner by
494 sites
• Node: Dedicated server that runs components of PlanetLab services.• Site: A location, e.g., UIUC, that hosts a number of nodes.• Sliver: Virtual division of each node. Currently, uses VMs, but it could also other technology.
Needed for timesharing across users.• Slice: A spatial cut-up of the PL nodes. Per user. A slice is a way of giving each user (Unix-shell
like) access to a subset of PL machines, selected by the user. A slice consists of multiple slivers, one at each component node.
• Thus, PlanetLab allows you to run real world-wide experiments.• Many services have been deployed atop it, used by millions (not just researchers): Application-level
DNS services, Monitoring services, CDN services.• If you need a PlanetLab account and slice for your CS525 experiment, let me know asap! There are
a limited number of these available for CS525.
9All images © PlanetLab
Emulab• A community resource open to researchers in academia and industry• https://www.emulab.net/ • A cluster, with currently 475 nodes• Founded and owned by University of Utah (led by Prof. Jay Lepreau)
• As a user, you can:– Grab a set of machines for your experiment– You get root-level (sudo) access to these machines– You can specify a network topology for your cluster (ns file format)
• Thus, you are not limited to only single-cluster experiments; you can emulate any topology
• Is Emulab a cloud? Is PlanetLab a cloud?
• If you need an Emulab account for your CS525 experiment, let me know asap! There are a limited number of these available for CS525.
10All images © Emulab
11
And then there were…Grids!
What is it?
12
Example: Rapid Atmospheric Modeling System, ColoState U
• Hurricane Georges, 17 days in Sept 1998– “RAMS modeled the mesoscale convective
complex that dropped so much rain, in good agreement with recorded data”
– Used 5 km spacing instead of the usual 10 km– Ran on 256+ processors
• Can one run such a program without access to a supercomputer?
13
Wisconsin
MITNCSA
Distributed ComputingResources
14
An Application Coded by a Physicist
Job 0
Job 2
Job 1
Job 3
Output files of Job 0Input to Job 2
Output files of Job 2Input to Job 3
Jobs 1 and 2 can be concurrent
15
An Application Coded by a Physicist
Job 2
Output files of Job 0Input to Job 2
Output files of Job 2Input to Job 3
May take several hours/days4 stages of a job
InitStage inExecuteStage outPublish
Computation Intensive, so Massively Parallel
Several GBs
16
Wisconsin
MITNCSA
Job 0
Job 2Job 1
Job 3
17
Job 0
Job 2Job 1
Job 3
Wisconsin
MIT
Condor Protocol
NCSAGlobus Protocol
18
Job 0
Job 2Job 1
Job 3Wisconsin
MITNCSA
Globus Protocol
Internal structure of differentsites invisible to Globus
External Allocation & SchedulingStage in & Stage out of Files
19
Job 0
Job 3Wisconsin
Condor Protocol
Internal Allocation & SchedulingMonitoringDistribution and Publishing of Files
20
Tiered Architecture (OSI 7 layer-like)
Resource discovery,replication, brokering
High energy Physics apps
Globus, Condor
Workstations, LANs
21
The Grid RecentlySome are 40Gbps links!(The TeraGrid links)
“A parallel Internet”
22
Globus Alliance
• Alliance involves U. Illinois Chicago, Argonne National Laboratory, USC-ISI, U. Edinburgh, Swedish Center for Parallel Computers
• Activities : research, testbeds, software tools, applications
• Globus Toolkit (latest ver - GT3) “The Globus Toolkit includes software services and libraries for
resource monitoring, discovery, and management, plus security and file management. Its latest version, GT3, is the first full-scale implementation of new Open Grid Services Architecture (OGSA).”
23
Some Things Grid Researchers Consider Important
• Single sign-on: collective job set should require once-only user authentication
• Mapping to local security mechanisms: some sites use Kerberos, others using Unix
• Delegation: credentials to access resources inherited by subcomputations, e.g., job 0 to job 1
• Community authorization: e.g., third-party authentication
• For clouds, you need to additionally worry about failures, scale, on-demand nature, and so on.
24
• Cloud computing vs. Grid computing: what are the differences?
• National Lambda Rail: hot in 2000s, funding pulled in 2009
• What has happened to the Grid Computing Community?– See Open Cloud Consortium– See CCA conference (2008, 2009)
Discussion Points
Backups
25
26
Normal No backup tasks 200 processes killed
Sort
• Backup tasks reduce job completion time a lot!• System deals well with failures
M = 15000 R = 4000
Workload: 1010 100-byte records (modeled after TeraSort benchmark)
27
More
• Entire community, with multiple conferences, get-togethers (GGF), and projects
• Grid Projects:http://www-fp.mcs.anl.gov/~foster/grid-projects/
• Grid Users: – Today: Core is the physics community (since the Grid originates
from the GriPhyN project)– Tomorrow: biologists, large-scale computations (nug30
already)?
28
Grid History – 1990’s• CASA network: linked 4 labs in California and New Mexico
– Paul Messina: Massively parallel and vector supercomputers for computational chemistry, climate modeling, etc.
• Blanca: linked sites in the Midwest– Charlie Catlett, NCSA: multimedia digital libraries and remote
visualization
• More testbeds in Germany & Europe than in the US• I-way experiment: linked 11 experimental networks
– Tom DeFanti, U. Illinois at Chicago and Rick Stevens, ANL:, for a week in Nov 1995, a national high-speed network infrastructure. 60 application demonstrations, from distributed computing to virtual reality collaboration.
• I-Soft: secure sign-on, etc.