UC Berkeley
1*Director, Intel Research Berkeleyhttp://abovetheclouds.cs.berkeley.edu/
Cloud Computing: Past, Present, and Future
Professor Anthony D. Joseph*, UC BerkeleyReliable Adaptive Distributed Systems Lab
RWTH Aachen22 March 2010
UC Berkeley
RAD Lab 5-year Mission
Enable 1 person to develop, deploy, operate next -generation Internet application
• Key enabling technology: Statistical machine learning– debugging, monitoring, pwr mgmt, auto-configuration, perf prediction, ...
• Highly interdisciplinary faculty & students– PI’s: Patterson/Fox/Katz (systems/networks), Jordan (machine learning),
Stoica (networks & P2P), Joseph (security), Shenker (networks), Franklin (DB)– 2 postdocs, ~30 PhD students, ~6 undergrads
• Grad/Undergrad teaching integrated with research
Course Timeline
• Friday– 10:00-12:00 History of Cloud Computing: Time-sharing, virtual
machines, datacenter architectures, utility computing– 12:00-13:30 Lunch– 13:30-15:00 Modern Cloud Computing: economics, elasticity,
failures– 15:00-15:30 Break– 15:30-17:00 Cloud Computing Infrastructure: networking,
storage, computation models• Monday
– 10:00-12:00 Cloud Computing research topics: scheduling, multiple datacenters, testbeds
NEXUS: A COMMON SUBSTRATE FOR CLUSTER COMPUTING
Joint work with Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Scott Shenker, and Ion Stoica
Recall: Hadoop on HDFS
datanode daemon
Linux file system
…
tasktracker
slave node
datanode daemon
Linux file system
…
tasktracker
slave node
datanode daemon
Linux file system
…
tasktracker
slave node
namenode
namenode daemon
job submission node
jobtracker
Adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed Computing Seminar, 2007 (licensed under Creation Commons Attribution 3.0 License)
Problem
•Rapid innovation in cluster computing frameworks•No single framework optimal for all applications•Energy efficiency means maximizing cluster utilization•Want to run multiple frameworks in a single cluster
What do we want to run in the cluster?
Dryad
ApacheHama
Pregel
Pig
Why share the cluster between frameworks?
• Better utilization and efficiency (e.g., take advantage of diurnal patterns)
• Better data sharing across frameworks and applications
Solution
Nexus is an “operating system” for the cluster over which diverse frameworks can run
– Nexus multiplexes resources between frameworks– Frameworks control job execution
Goals
• Scalable• Robust (i.e., simple enough to harden)• Flexible enough for a variety of different
cluster frameworks• Extensible enough to encourage innovative
future frameworks
Question 1: Granularity of SharingOption: Coarse-grained sharing
– Give framework a (slice of) machine for its entire duration
Hadoop 1
Hadoop 2
Hadoop 3
Data locality compromised if machine held for long time
Hard to account for new frameworks and changing demands -> hurts utilization and interactivity
Nexus: Fine-grained sharing– Support frameworks that use smaller tasks (in time and
space) by multiplexing them across all available resources
Question 1: Granularity of Sharing
Frameworks can take turns accessing data on each node
Can resize frameworks shares to get utilization & interactivity
Hadoop 1
Hadoop 1
Hadoop 1
Hadoop 1Hadoop 3
Hadoop 3 Hadoop 3
Hadoop 3
Hadoop 3
Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2
Hadoop 1
Hadoop 3
Hadoop 2Hadoop 3
Hadoop 1
Hadoop 2
Question 2: Resource Allocation
Option: Global scheduler– Frameworks express needs in a specification language, a
global scheduler matches resources to frameworks
• Requires encoding a framework’s semantics using the language, which is complex and can lead to ambiguities
• Restricts frameworks if specification is unanticipated
Designing a general-purpose global scheduler is hard
Question 2: Resource Allocation
Nexus: Resource offers– Offer free resources to frameworks, let frameworks
pick which resources best suit their needs+Keeps Nexus simple and allows us to support future jobs- Distributed decisions might not be optimal
Outline
• Nexus Architecture• Resource Allocation• Multi-Resource Fairness• Implementation• Results
NEXUS ARCHITECTURE
Nexus slave
Nexus master
Hadoop v20 scheduler
Nexus slave
Hadoop job
Hadoop v20 executor
task
Nexus slaveHadoop v19 executor
task
MPIscheduler
MPI job
MPIexecutor
task
Overview
Hadoop v19 scheduler
Hadoop job
Hadoop v19 executor
task
MPIexecutor
task
Nexus slaveNexus slave
Nexus master
MPI executor
task
Hadoopscheduler
Hadoop job
Resource Offers
MPIscheduler
MPI job
MPIexecutor
task
Pick framework to offer to
Resourceoffer
Nexus slaveNexus slave
Nexus master
MPI executor
task
Hadoopscheduler
Hadoop job
Resource Offers
MPIscheduler
MPI job
MPIexecutor
task
Pick framework to offer to
Resource offer
offer = list of {machine,
free_resources}
Example: [ {node 1, <2 CPUs, 4 GB>}, {node 2, <2 CPUs, 4 GB>} ]
Nexus slave
Nexus master
Nexus slaveMPI
executor
task
Hadoopscheduler
Hadoop job
Hadoopexecutor
Resource Offers
MPIscheduler
MPI job
MPIexecutor
task
Framework-specific scheduling
Pick framework to offer to
Launches & isolates executors
task
Resourceoffer
Resource Offer Details
•Min and max task sizes to control fragmentation•Filters let framework restrict offers sent to it
– By machine list– By quantity of resources
•Timeouts can be added to filters•Frameworks can signal when to destroy filters, or when they want more offers
Using Offers for Data Locality
We found that a simple policy called delay scheduling can give very high locality:
– Framework waits for offers on nodes that have its data
– If waited longer than a certain delay, starts launching non-local tasks
Framework Isolation
• Isolation mechanism is pluggable due to the inherent perfomance/isolation tradeoff
• Current implementation supports Solaris projects and Linux containers – Both isolate CPU, memory and network
bandwidth– Linux developers working on disk IO isolation
• Other options: VMs, Solaris zones, policing
RESOURCE ALLOCATION
Allocation Policies
•Nexus picks framework to offer resources to, and hence controls how many resources each framework can get (but not which)•Allocation policies are pluggable to suit organization needs, through allocation modules
Example: Hierarchical Fairshare Policy
Facebook.com
Spam Ads
Job 3
Job 2
User 1
Job 1
User 2
Job 4
100%
0 1 2 30%
20%
40%
60%
80%
100%
Cluster Utilization
Time
CurrTime
80%20%
30%
70%User 1User 2
Cluster Share Policy
20%
80%
Spam Dept.Ads Dept.
20% 14%100%
CurrTime
6%
CurrTime
0%
70%30%
Revocation
Killing tasks to make room for other usersNot the normal case because fine-grained tasks enable quick reallocation of resources Sometimes necessary:
– Long running tasks never relinquishing resources– Buggy job running forever– Greedy user who decides to makes his task long
Revocation Mechanism
Allocation policy defines a safe share for each user– Users will get at least safe share within specified time
Revoke only if a user is below its safe share and is interested in offers– Revoke tasks from users farthest above their safe
share– Framework warned before its task is killed
How Do We Run MPI?
Users always told their safe share– Avoid revocation by staying below it
Giving each user a small safe share may not be enough if jobs need many machines Can run a traditional grid or HPC scheduler as a user with a larger safe share of the cluster, and have MPI jobs queue up on it
– E.g. Torque gets 40% of cluster
Example: Torque on Nexus
MPI Job
40%Safe share =
40%
MPI Job
MPI Job
Torque
MPI Job
Facebook.com
Spam Ads
Job 1
Job 2
User 1
Job 1
User 2
Job 4
40%20%
MULTI-RESOURCE FAIRNESS
What is Fair?
•Goal: define a fair allocation of resources in the cluster between multiple users•Example: suppose we have:– 30 CPUs and 30 GB RAM–Two users with equal shares–User 1 needs <1 CPU, 1 GB RAM> per task–User 2 needs <1 CPU, 3 GB RAM> per task
•What is a fair allocation?
•Idea: give weights to resources (e.g. 1 CPU = 1 GB) and equalize value of resources given to each user•Algorithm: when resources are free, offer to whoever has the least value•Result:
– U1: 12 tasks: 12 CPUs, 12 GB ($24)– U2: 6 tasks: 6 CPUs, 18 GB ($24)
Definition 1: Asset Fairness
PROBLEMUser 1 has < 50% of both CPUs and RAM
CPU
User 1
User 2100
%
50%
0%RAM
Lessons from Definition 1
•“You shouldn’t do worse than if you ran a smaller, private cluster equal in size to your share”•Thus, given N users, each user should get ≥ 1/N of his dominating resource (i.e., the resource that he consumes most of)
Def. 2: Dominant Resource Fairness
•Idea: give every user an equal share of her dominant resource (i.e., resource it consumes most of) •Algorithm: when resources are free, offer to the user with the smallest dominant share (i.e., fractional share of the her dominant resource)•Result:
– U1: 15 tasks: 15 CPUs, 15 GB– U2: 5 tasks: 5 CPUs, 15 GB
CPU
User 1
User 2100
%
50%
0%RAM
Fairness PropertiesScheduler →Property ↓
Asset Dynamic CEEI DRF
Pareto efficiency x x x x
Single-resource fairness
x x x x
Bottleneck fairness x x x
Share guarantee x x
Population monotonicity x x
Envy-freedom x x x
Resource monotonicity
IMPLEMENTATION
Implementation Stats
7000 lines of C++
APIs in C, C++, Java, Python, Ruby
Executor isolation using Linux containers and Solaris projects
Frameworks
Ported frameworks:– Hadoop (900 line patch)
– MPI (160 line wrapper scripts)
New frameworks:– Spark, Scala framework for iterative jobs (1300 lines)
– Apache+haproxy, elastic web server farm (200 lines)
RESULTS
Overhead
Unmodified MPI
MPI on Nexus
0
10
20
30
40
50
6050.9 51.8
MPI LINPACK
Com
pleti
on T
ime
(s)
Unmodified Hadoop
Hadoop on Nexus
0255075
100125150175200
159.9166.2
Hadoop WordCount
Com
pleti
on T
ime
(s)
Less than 4% seen in practice
Dynamic Resource Sharing
1 23 45 67 89 1111331551771992212432652873093313530%
10%20%30%40%50%60%70%80%90%
100%
MPIHadoopSpark
Time (s)
Shar
e of
Clu
ster
Multiple Hadoops Experiment
Hadoop 1
Hadoop 2
Hadoop 3
Multiple Hadoops Experiment
Hadoop 1
Hadoop 1
Hadoop 1 Hadoop 1
Hadoop 1Hadoop 3
Hadoop 3 Hadoop 3
Hadoop 3
Hadoop 3
Hadoop 3
Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2Hadoop 2
Hadoop 2
Hadoop 2 Hadoop 1
Hadoop 1
Hadoop 2
Hadoop 3 Hadoop 2
Hadoop 3
Results with 16 Hadoops
Separate Hadoops Hadoops on Nexus, no delay
sched.
Hadoops on Nexus, 1s delay
sched.
Hadoops on Nexus, 5s delay
sched.
0%
20%
40%
60%
80%
100%
18%
50%
92%97%
Perc
ent L
ocal
Map
s
Separate Hadoops Hadoops on Nexus, no delay
sched.
Hadoops on Nexus, 1s delay
sched.
Hadoops on Nexus, 5s delay
sched.
0
100
200
300
400
500
600 565
486
369338
Job
Runn
ing
Tim
e (s
)
WEB SERVER FARM FRAMEWORK
Load calculation
Nexus slave
Web Framework Experiment
Nexus master
Nexus slaveWeb
executortask
(Apache)
Scheduler (haproxy)
Load gen framework
Load gen executor
task
httperf
Nexus slaveWeb
executortask
(Apache)
Load gen executor
task
HTTP requestHTTP
request
Load gen
task task
executorWeb executor
task(Apache)
HTTP request
resource offer
task
status update
Web Framework Results
Future Work
Experiment with parallel programming modelsFurther explore low-latency services on Nexus (web applications, etc)Shared services (e.g. BigTable, GFS)Deploy to users and open source
CLOUD COMPUTING TESTBEDS
OPEN CIRRUS™: SEIZING THE OPEN SOURCE CLOUD STACK OPPORTUNITYA JOINT INITIATIVE SPONSORED BY HP, INTEL, AND YAHOO!
http://opencirrus.org/
Applications
Application FrameworksMapReduce, Sawzall, Google App Engine, Protocol Buffers
Hardware Infrastructure Borg
Software Infrastructure
VM Management
Job SchedulingBorgStorage ManagementGFS, BigTableMonitoringBorg
Applications
Application FrameworksEMR – Hadoop
Hardware Infrastructure
Software Infrastructure
VM ManagementEC2Job Scheduling
Storage ManagementS3, EBSMonitoringBorg
AMAZON
Applications
Application Frameworks.NET Services
Hardware Infrastructure Fabric Controller
Software Infrastructure
VM ManagementFabric ControllerJob SchedulingFabric ControllerStorage ManagementSQL Services, blobs, tables, queuesMonitoringFabric Controller
MICROSOFT
Publicly accessible layerProprietary Cloud Computing stacks
Hardware InfrastructurePRS Emulab Cobbler xCat
VM ManagementEucalyptus EnomalismTashi ReservoirNimbus,oVirt
Job SchedulingMaui/Torque
Storage ManagementHDFSKFSGlusterLustrePVFSMooseFS HBase Hypertable
MonitoringGangliaNagiosZenossMONMoaraHeavily
fragmented today!
Applications
Application Frameworks
Pig, Hadoop, MPI, Sprout, Mahout
Hardware Infrastructure PRS, Emulab, Cobbler, xCat
Software Infrastructure
VM Management
Job Scheduling
Storage Management
Monitoring
Open Cloud Computing stacks
Open Cirrus™ Cloud Computing TestbedShared: research, applications, infrastructure (12K cores), data sets
Global services: sign on, monitoring, store. Open src stack (prs, tashi, hadoop)
Sponsored by HP, Intel, and Yahoo! (with additional support from NSF)
• 9 sites currently, target of around 20 in the next two years.
Open Cirrus Goals
• Goals• Foster new systems and services research around cloud
computing• Catalyze open-source stack and APIs for the cloud
• How are we unique?• Support for systems research and applications research• Federation of heterogeneous datacenters
Open Cirrus Organization• Central Management Office, oversees Open Cirrus
• Currently owned by HP
• Governance model• Research team • Technical team • New site additions • Support (legal (export, privacy), IT, etc.)
• Each site • Runs its own research and technical teams • Contributes individual technologies • Operates some of the global services
• E.g. • HP site supports portal and PRS• Intel site developing and supporting Tashi• Yahoo! contributes to Hadoop
1 Gb/s (x8 p2p)
Intel BigData Open Cirrus Site
45 Mb/s T3 to Internet
3U Rack5 storage nodes-------------12 1TB Disks
1 Gb/s (x2x5 p2p)
x3
20 nodes: 1 Xeon (1-core) [Irwindale/Pent4], 6GB DRAM, 366GB disk (36+300GB)10 nodes: 2 Xeon 5160 (2-core) [Woodcrest/Core], 4GB RAM, 2 75GB disks10 nodes: 2 Xeon E5345 (4-core) [Clovertown/Core],8GB DRAM, 2 150GB Disk x2
Switch48 Gb/s
x2
1 Gb/s (x4x4 p2p)
Blade Rack 40 nodes
Switch48 Gb/s
1 Gb/s (x4x4 p2p)
Blade Rack 40 nodes
Switch48 Gb/s
1 Gb/s (x15 p2p)
1U Rack 15 nodes
Switch48 Gb/s
1 Gb/s (x15 p2p)
2U Rack 15 nodes
Switch48 Gb/s
1 Gb/s (x15 p2p)
2U Rack 15 nodes
Switch48 Gb/s
Mobile Rack 8 (1u) nodes-------------2 Xeon E5440(quad-core)[Harpertown/Core 2] 16GB DRAM2 1TB Disk
Switch24 Gb/s*
PDUw/per-port monitoring and control
Switch48 Gb/s
1 Gb/s (x8)
1 Gb/s (x4)
1 Gb/s (x4)
1 Gb/s (x4)
1 Gb/s (x4)
1 Gb/s (x4)
1 Gb/s (x4)
1 Gb/s (x4)
(r2r2c1-4)(r2r1c1-4)
r2r1c1-4 r2r2c1-4 r1r1 r1r2r1r3 r1r4
r2r3 r3r2 r3r3 mobile storage TOTALNodes 40 40 30 45 30 8 5 198Cores 140 320 240 360 240 64 1364DRAM (GB) 240 320 240 360 480 128 1768Spindles 80 80 60 270 180 16 60 746Storage (TB) 12 12 60 270 180 16 60 610
(r1r5)
Key:rXrY=row X rack YrXrYcZ=row X rack Y chassis Z
2 Xeon E5345(quad-core)[Clovertown/Core]8GB DRAM2 150GB Disk
2 Xeon E5420(quad-core)[Harpertown/Core 2]8GB DRAM2 1TB Disk
2 Xeon E5440(quad-core)[Harpertown/Core 2]8GB DRAM6 1TB Disk
2 Xeon E5520(quad-core)[Nehalem-EP/Core i7] 16GB DRAM6 1TB Disk
(r1r1, r1r2) (r1r3, r1r4, r2r3) (r3r2, r3r3)
http://opencirrus.intel-research.net
Open Cirrus Sites
SiteCharacteristics
#Cores#Srvrs Public Memory Storage Spindles Network Focus
HP 1,024 256 178 3.3TB 632TB 115210G internal1Gb/s x-rack
Hadoop, Cells, PRS, scheduling
IDA 2,400 300 100 4.8TB43TB+
16TB SAN600 1Gb/s
Apps based onHadoop, Pig
Intel 1,364 198 145 1.8TB610TB local 60TB attach
746 1Gb/sTashi, PRS, MPI,
Hadoop
KIT 2,048 256 128 10TB 1PB 192 1Gb/sApps with high
throughput
UIUC 1,024 128 64 2TB ~500TB 288 1Gb/sDatasets, cloud infrastructure
CMU 1,024 128 64 2TB -- -- 1 Gb/s Storage, Tashi
Yahoo(M45)
3,200 480 400 2.4TB 1.2PB 1600 1Gb/sHadoop on
demand
12,074 1,746 1,029 26.3 TB 4 PBTotal
Testbeds
Open Cirrus
IBM/Google TeraGrid PlanetLab EmuLabOpen Cloud Consortium
Amazon EC2
LANL/NSF cluster
Type of research
Systems & services
Data-intensive
applications research
Scientific applications
Systems and
servicesSystems
Interoperab. across clouds using open
APIs
Commer. use
Systems
Approach
Federation of hetero-geneous
data centers
A cluster supported by Google
and IBM
Multi-site hetero
clusters super comp
A few 100 nodes
hosted by research
instit.
A single-site cluster with
flexible control
Multi-site heteros
clusters, focus on network
Raw access to
virtual machines
Re-use of LANL’s retiring clusters
Participants
HP, Intel, IDA, KIT, UIUC, Yahoo!CMU
IBM, Google, Stanford, U.Wash,
MIT
Many schoolsand orgs
Many schools and orgs
University of Utah
4 centers AmazonCMU, LANL,
NSF
Distribution
7(9) sites1,746 nodes12,074 cores
1 site11
partners in US
> 700 nodes
world-wide
>300 nodes univ@Utah
480 cores, distributed in four locations
1 site
1 site1000s of older, still
useful nodes
Testbed Comparison
Open Cirrus Stack
Compute + network + storage resources
Power + cooling
Management and control subsystem
Physical Resource set (Zoni) service
Credit: John Wilkes (HP)
Open Cirrus Stack
Zoni service
Research Tashi NFS storage service
HDFS storageservice
PRS clients, each with theirown “physical data center”
Open Cirrus Stack
Zoni service
Research Tashi NFS storage service
HDFS storageservice
Virtual cluster Virtual cluster
Virtual clusters (e.g., Tashi)
Open Cirrus Stack
Zoni service
Research Tashi NFS storage service
HDFS storageservice
Virtual cluster Virtual cluster
BigData App
Hadoop
1. Application running2. On Hadoop3. On Tashi virtual cluster4. On a PRS5. On real hardware
Open Cirrus Stack
Zoni service
Research Tashi NFS storage service
HDFS storageservice
Virtual cluster Virtual cluster
BigData app
Hadoop
Experiment/save/restore
Open Cirrus Stack
Zoni service
Research Tashi NFS storage service
HDFS storageservice
Virtual cluster Virtual cluster
BigData App
Hadoop
Experiment/save/restore
Platform services
Open Cirrus Stack
Zoni service
Research Tashi NFS storage service
HDFS storageservice
Virtual cluster Virtual cluster
BigData App
Hadoop
Experiment/save/restore
Platform services
User services
Open Cirrus Stack
Zoni
Research Tashi NFS storage service
HDFS storageservice
Virtual cluster Virtual cluster
BigData App
Hadoop
Experiment/save/restore
Platform services
User services
System Organization
•Compute nodes are divided into dynamically-allocated, vlan-isolated PRS subdomains
•Apps switch back and forth between virtual and phyiscal
Openservice research
Tashi development
Proprietaryservice research
Apps running in a VM mgmt infrastructure (e.g., Tashi)
Open workload monitoring and trace collection
Production storage service
Open Cirrus stack - Zoni
• Zoni service goals• Provide mini-datacenters to researchers• Isolate experiments from each other• Stable base for other research
• Zoni service approach• Allocate sets of physical co-located nodes, isolated inside VLANs.
• Zoni code from HP being merged into Tashi Apache project and extended by Intel
• Running on HP site• Being ported to Intel site• Will eventually run on all sites
Open Cirrus Stack - Tashi• An open source Apache Software Foundation project sponsored by Intel (with CMU, Yahoo, HP)
• Infrastructure for cloud computing on Big Data • http://incubator.apache.org/projects/tashi
• Research focus: • Location-aware co-scheduling of VMs, storage,
and power.• Seamless physical/virtual migration.
• Joint with Greg Ganger (CMU), Mor Harchol-Balter (CMU), Milan Milenkovic (CTG)
ClusterManager
Tashi High-Level Design
Node NodeNodeNodeNode
Storage Service
Virtualization Service
Node
Scheduler
Cluster nodes are assumed to be commodity machines
Services are instantiated through virtual machines
Data location and power informationis exposed to scheduler and services
CM maintains databasesand routes messages;decision logic is limited
Most decisions happen inthe scheduler; manages compute/storage/power in concert
The storage service aggregates thecapacity of the commodity nodes to house Big Data repositories.
Calculated (40 racks * 30 nodes * 2 disks)
0
50
100
150
200
250
300
Disk-1G SSD-1G Disk-10G SSD-10G
Thr
ough
put/
disk
(M
B/s
)
Random Placement Location-Aware Placement
3.6X
11X
3.5X
9.2X
Location Matters (calculated)
73
Open Cirrus Stack - Hadoop
• An open-source Apache Software Foundation project sponsored by Yahoo!
• http://wiki.apache.org/hadoop/ProjectDescription
• Provides a parallel programming model (MapReduce), a distributed file system, and a parallel database (HDFS)
What kinds of research projects are Open Cirrus sites looking for?
• Open Cirrus is seeking research in the following areas (different centers will weight these differently):
• Datacenter federation• Datacenter management• Web services• Data-intensive applications and systems
• The following kinds of projects are generally not of interest:
• Traditional HPC application development• Production applications that just need lots of cycles• Closed source system development
How do users get access to Open Cirrus sites?
• Project PIs apply to each site separately.
• Contact names, email addresses, and web links for applications to each site will be available on the Open Cirrus Web site (which goes live Q209)– http://opencirrus.org
• Each Open Cirrus site decides which users and projects get access to its site.
• Developing a global sign on for all sites (Q2 09)– Users will be able to login to each Open Cirrus site for which they are
authorized using the same login and password.
Summary and Lessons
• Intel is collaborating with HP and Yahoo! to provide a cloud computing testbed for the research community
• Using the cloud as an accelerator for interactive streaming/big data apps is an important usage model
• Primary goals are to • Foster new systems research around cloud computing• Catalyze open-source reference stack and APIs for the cloud
– Access model, Local and global services, Application frameworks
• Explore location-aware and power-aware workload scheduling• Develop integrated physical/virtual allocations to combat cluster squatting• Design cloud storage models
– GFS-style storage systems not mature, impact of SSDs unknown
• Investigate new application framework alternatives to map-reduce/Hadoop
OTHER CLOUD COMPUTING RESEARCH TOPICS: ISOLATION AND DC ENERGY
Heterogeneity in Virtualized Environments
• VM technology isolates CPU and memory, but disk and network are shared– Full bandwidth when no contention– Equal shares when there is contention
• 2.5x performance difference
1 2 3 4 5 6 70
10
20
30
40
50
60
70
VMs on Physical Host
IO P
erfo
rman
ce p
er V
M (M
B/s)
EC2 small instances
Isolation Research
• Need predictable variance over raw performance• Some resources that people have run into
problems with: – Power, disk space, disk I/O rate (drive, bus), memory
space (user/kernel), memory bus, cache at all levels (TLB, etc), hyperthreading/etc, CPU rate, interrupts
– Network: NIC (Rx/Tx), Switch, cross-datacenter, cross-country
– OS resources: File descriptors, ports, sockets
Datacenter Energy
• EPA, 8/2007:– 1.5% of total U.S. energy consumption– Growing from 60 to 100 Billion kWh in 5 yrs– 48% of typical IT budget spent on energy
• 75 MW new DC deployments in PG&E’s service area – that they know about! (expect another 2x)
• Microsoft: $500m new Chicago facility– Three substations with a capacity of 198MW– 200+ shipping containers w/ 2,000 servers each– Overall growth of 20,000/month
81
Power/Cooling Issues
First Milestone: DC Energy Conservation
• DCs limited by power– For each dollar spent on servers, add $0.48
(2005)/$0.71 (2010) for power/cooling– $26B spent to power and cool servers in 2005
grows to $45B in 2010• Within DC racks, network equipment often the
“hottest” components in the hot spot
Thermal Image of Typical Cluster Rack
M. K. Patterson, A. Pratt, P. Kumar, “From UPS to Silicon: an end-to-end evaluation of datacenter efficiency”, Intel Corporation
RackSwitch
DC Networking and Power
• Selectively power down ports/portions of net elements• Enhanced power-awareness in the network stack
– Power-aware routing and support for system virtualization• Support for datacenter “slice” power down and restart
– Application and power-aware media access/control• Dynamic selection of full/half duplex• Directional asymmetry to save power,
e.g., 10Gb/s send, 100Mb/s receive – Power-awareness in applications and protocols
• Hard state (proxying), soft state (caching), protocol/data “streamlining” for power as well as b/w reduction
• Power implications for topology design– Tradeoffs in redundancy/high-availability vs. power consumption– VLANs support for power-aware system virtualization
Summary
• Many areas for research into Cloud Computing!– Datacenter design, languages, scheduling,
isolation, energy efficiency (at all levels)• Opportunities to try out research at scale!
– Amazon EC2, Open Cirrus, …