+ All Categories
Home > Documents > Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf ·...

Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf ·...

Date post: 05-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
18
Cloud Compu)ng: Recent Trends, Challenges and Open Problems Kaustubh Joshi, H. Andrés LagarBCavilla {kaustubh,andres}@research.aH.com AT&T Labs – Research Tutorial? Our assump)ons about this audience You’re in research You can code (or once upon a )me, you could code) Therefore, you can google and follow a tutorial You’re not interested in “how to”s You’re interested in the issues Outline Historical overview IaaS, PaaS Research Direc)ons Users: scaling, elas)city, persistence, availability Providers: provisioning, elas)city, diagnosis Open Challenges Security, privacy The Alphabet Soup IaaS, PaaS, CaaS, SaaS What are all these aaSes? Let’s answer a different ques)on What was the )pping point?
Transcript
Page 1: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Cloud&Compu)ng:&Recent&Trends,&Challenges&and&Open&Problems&

Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&{kaustubh,andres}@research.aH.com&

AT&T&Labs&–&Research&

Tutorial?&Our&assump)ons&about&this&audience&•  You’re&in&research&•  You&can&code&&

–  (or&once&upon&a&)me,&you&could&code)&

•  Therefore,&you&can&google&and&follow&a&tutorial&

•  You’re&not&interested&in&“how&to”s&•  You’re&interested&in&the&issues&

Outline&

•  Historical&overview&–  IaaS,&PaaS&

•  Research&Direc)ons&– Users:&scaling,&elas)city,&persistence,&availability&– Providers:&provisioning,&elas)city,&diagnosis&

•  Open&Challenges&– Security,&privacy&

The&Alphabet&Soup&

•  IaaS,&PaaS,&CaaS,&SaaS&•  What&are&all&these&aaSes?&•  Let’s&answer&a&different&ques)on&•  What&was&the&)pping&point?&

Page 2: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Before&

•  A&“cloud”&meant&the&Internet/the&network&

August&2006&

•  Amazon&Elas)c&Compute&Cloud,&EC2&•  Successfully&ar)culated&IaaS&offering&•  IaaS&==&Infrastructure&as&a&Service&•  Swipe&your&credit&card,&and&spin&up&your&VM&•  Why&VM?&

– Easy&to&maintain&(black&box)&– User&can&be&root&(forego&sys&admin)&–  Isola)on,&security&

IaaS&can&only&go&so&far&

•  A&VM&is&an&x86&container&– Your&least&common&denominator&is&assembly&

•  Elas)c&Block&Store&(EBS)&– Your&least&common&denominator&is&a&byte&

•  Rackspace,&Mosho,&GoGrid,&etc&

Evolu)on&into&PaaS&

•  Plaiorm&as&a&Service&is&higher&level&•  SimpleDB&(Rela)onal&tables)&•  Simple&Queue&Service&•  Elas)c&Load&Balancing&•  Flexible&Payment&Service&•  Beanstalk&(upload&your&JAR)&

Page 3: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

PaaS&diversity&(and&lockBin)&

•  Microsol&Azure&–  .NET,&SQL&

•  Google&App&Engine&– Python,&Java,&GQL,&memcached&

•  Heroku&– Ruby&

•  Joyent&– Node.js&and&JavaScript&

Our&Focus&

•  Infrastructure&•  and&Plaiorm&•  as&a&Service&

–  (not&Gmail)&

x86& JAR&

Byte& Key&&Value&

What&Is&So&Different?&

•  HardwareBcentric&vs.&API$centric+•  Never&care&about&drivers&again&

– Or&sysBadmins,&or&power&bills&

•  You&can&scale&if&you&have&the&money&– You&can&deploy&on&two&con)nents&– And&ten&thousand&servers&– And&2TB&of&storage&

•  Do&you&know+how+to&do&that?&

Your&New&Concerns&

User&•  How&will&I&horizontally&scale&my&applica)on&•  How&will&my&applica)on&deal&with&distribu)on&

–  Latency,&par))oning,&concurrency&•  How&will&I&guarantee&availability&

–  Failures&will&happen.&Dependencies&are&unknown.&Provider&•  How&will&I&maximize&mul)plexing?&•  Can&I&scale&*and*&provide&SLAs?&•  How&can&I&diagnose&infrastructure&problems?&

Page 4: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Thesis&Statement&from&User&POV&

•  Cloud&is&an&IP&layer&–  It&provides&a&bestBeffort&substrate&– CostBeffec)ve&– OnBdemand&– Compute,&storage&

•  But&you&have&to&build&your&own&TCP&– Fault&tolerance!&– Availability,&durability,&QoS&

&

Let’s&Take&the&Example&of&Storage&

Horizontal&Scaling&in&Web&Services&

•  X&servers&B>&f(X)&throughput&–  X&load&B>&f(X)&servers&

•  Web&and&app&servers&are&mostly&SIMD&–  Process&requests&in&parallel,&independently&

•  But&down&there,&there&is&a&data&store&–  Consistent&–  Reliable&– Usually&rela)onal&

•  DB&defines&your&horizontal&scaling&capacity&

Data&Stores&Drive&System&Design&• &Alexa&GrepTheWeb&Case&Study&• &Storage&APIs&changing&&how&applica)ons&are&built&• &Elas6city+of&demand&means&elas)city&of&storage&QoS&

Page 5: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Cloud&SQL &&

•  Tradi)onal&Rela)onal&DBs&•  If&you&don’t&want&to&build&your&rela)onal&TCP&

– Azure&– Amazon&RDS&– Google&Query&Language&(GQL)&– You&can&always&bundle&MySQL&in&your&VM&

•  Remember:&Best&effort.&Might&not&suit&your&needs&

Key&Value&Stores&

•  Two&primi)ves:&PUT&and&GET&•  Simple&B>&highly&replicated&and&available&•  One&or&more&of&

– No&range&queries&– No&secondary&keys&– No&transac)ons&– Eventual+consistency+

•  Are&you&missing&MySQL&already?&

Scalable&Data&Stores:&Elas)city&via&Consistent&Hashes&

•  E.g.:&Dynamo,&Cassandra&keyBstores&•  Each&nodes&mapped&to&k&pseudoBrandom&

angles&on&circle&•  Each&key&hashed&to&a&point&on&the&circle&•  Object&assigned&to&next&w&nodes&on&circle&•  Permanent&Node&removal:&

–  Objects&dispersed&uniformly&among&remaining&nodes&(for&large&k)&

•  Node&addi)on:&–  Steals&data&from&k&random&nodes&

•  Node&temporarily&unavailable?&–  Sloppy&quorums&–  Choose&new&node&–  Invoke&consistency&mechanisms&on&rejoin&

Object&key&hash&

3&nodes,&w=3,&r=1&

Store&object&at&next&k&nodes&

Eventual&Consistency&

•  Clients&A&and&B&concurrently&write&to&same&key&– Network&par))oned&– Or,&too&far&apart:&USA&–&Europe&

•  Later,&client&C&reads&key&–  Conflic)ng&vector&(A,&B)&–  TimestampBbased&)eBbreaker:&Cassandra&[LADIS&09],&SimpleDB,&S3&

•  Poor!&– Applica)onBlevel&conflict&solver:&Dynamo&[SOSP&09],&Amazon&shopping&carts&

(K=X,&V=Y)&

Client&B&(K=X,&V=B)&

Client&A&(K=X,&V=A)&

Client&C&Reads&K=X&V+=+<A,B>+

(or&even&V&=&<A,B,Y>)!&

Page 6: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

KV&Store&Key&Proper)es&

•  Very&simple:&PUT&&&GET&•  Simplicity&B>&replica)on&&&availability&•  Consistent&hashing&B>&elas)city,&scalability&•  Replica)on&&&availability&B>&eventual&consistency&

EC2&Key&Value&Stores&

•  Amazon&Simple&Storage&Service&(S3)&– “Classical”&KV&store&– “Classically”&eventual&consistent&

•  <K,V1>&• Write&<K,V2>&•  Read&K&B>&V1!&

– Read&your&Writes&consistency&•  Read&K&B>&V2&(phew!)&

– TimestampBbased&)eBbreaking&

EC2&Key&Value&Stores&

•  Amazon&SimpleDB&–  Is&it&really&a&KV&store?&

•  It&certainly&isn’t&a&rela)onal&DB&–  Tables&and&selects&– No&joins,&no&transac)ons&–  Eventually&consistent&

•  Timestamp&)eBbreaking&– Op)onal&Consistent&Reads&

•  Costly!&Reconcile&all&copies&–  Condi)onal&Put&for&“transac)ons”&

Pick&your&poison&

•  Perhaps&the&most&obvious&instance&of&&& & & &“BUILD&YOUR&OWN&TCP”&

•  Do&you&want&scalability?&•  Consistency?&•  Survivability?&&

Page 7: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

EC2&Storage&Op)ons:&&TPCBW&Performance&

Flavor+ Throughput+(WIPS)+

Cost+High+Load+($/WIPS)+

MySQL&in&your&own&VM&(EBS&underneath)&

477& 0.005&

RDS&(MySQL&aaS)& 462& 0.005&SimpleDB&(nonBrela)onal&DB,&range&queries)&

128& 0.005&

S3&(BBtrees,&update&queues&on&top&of&KV&store)&

1100+ 0.009&

Kossman&et&al,&[SIGMOD&10,08]&

Durability&use&case:&Disaster&Recovery&

•  Disaster&Recovery&(DR)&typically&too&expensive&– Dedicated&infrastructure&– “mirror”&datacenter&

•  Cloud:&not&anymore!&–  Infrastructure&is&a&Service&

•  But&cloud&storage&SLAs&become&key&•  Do&you&feel&confident&about&backing&up&to&a&single&cloud?&

Will&My&Data&Be&Available?&

•  Maybe&….&

Availability&Under&Uncertainty&

•  DepSky&[Eurosys&11],&Skute&[SOCC&10]&•  WriteBmany,&readBany&(availability)&

–  Increased&latency+on&writes&•  By&distribu)ng,&we&can&get&more&proper)es&“for&free”&– Confiden)ality?&&– Privacy?&

Page 8: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Availability&Under&Uncertainty&

•  DepSky&[Eurosys&11],&Skute&[SOCC&10]&•  Confiden)ality.&Privacy.&•  Write&2f+1,&read&f+1&

–  Informa)on&Dispersal&Algorithms&•  Need&f+1&parts&to&reconstruct&item&

– Secret&sharing&B>&need&f+1&key&fragments&– Erasure&Codes&B>&need&f+1&data&chunks&&

•  Increased+latency+

How&to&Deal&with&Latency&

•  It&is&a&problem,&but&also&an&opportunity&•  Mul)ple&Clouds!&

– “Regions”&in&EC2&•  Minimize&client&RTT&

– Client&in&the&East,&should&server&be&in&the&West&– Nature&is&tyrannical&

•  But,&CAP&will&bite&you&

WideBarea&Data&Stores:&CAP&Theorem&•  Pick&2:&Consistency,&Availability,&Par))onBTolerance&

& C& A&

P&

C& A&

P&

C& A&

P&

• &Role&of&A&and&P&interchangeable&for&mul)Bsite&• &ACID&guarantees&possible,&but&can’t&have&system&available&when&there&is&a&network&par))on&• &Tradi)onal&DBs:&MySQL,&Oracle&• &But&what&about&latency?&• &LatencyBconsistency&tradeoff&is&fundamental&

• &“Eventual&consistency”&e.g.,&Dynamo,&Cassandra&• &Must&be&able&to&resolve&conflicts&• &Suitable&for&crossBDC&replica)on&

Brewer,&PODC&04&keynote&Build&Your&Own&NoSQL&

•  Neilix&Use&Case&Scenario&– Cassandra,&MongoDB,&Riak,&TranslaÄce&

•  Mul)ple&“Clouds”&– EC2&availability&zones&– Do&you&automa)cally&replicate?&– How&are&reads/writes&sa)sfied&in&the&normal&case?&

•  Par))oned&behavior&– Write&availability?&Consistency?&

&

Page 9: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Build&Your&Own&NoSQL&

•  The&(r,w)&parameter&for&n&replicas&– Read&succeeds&aler&contac)ng&r&≤&n&&replicas&– Write&succeeds&aler&contac)ng&w&≤&n&&replicas&–  (r+w)&>&n:&quorum,&clients&resolve&inconsitencies&–  (r+w)&≤&n:&sloppy&quorum,&transient&inconsistency&

•  Fixed&(r=1,&w=n/2&+&1)&B>&e.g.&MongoDB&– Write&availability&lost&on&one&side&of&a&par))on&

•  Configurable&(r,w)&B>&e.g.&Cassandra&– Always&write&available&

Remember&

•  Cloud&is&IP&– Key&value&stores&are&not&as&featureBfull&as&MySQL&– Things&fail&

•  You&need&to&build&your&own&TCP&– Throughput&in&horizontal&scalable&stores&– Data&durability&by&wri)ng&to&mul)ple&clouds&– Consistency&in&the&event&of&par))ons&

Provider&Point&of&View&

Cloud&User&

Cloud&Provider&

?

Provider&Concerns&

•  Lets&focus&on&VMs&•  BeHer&mul)plexing&means&more&money&

– But&less&isola)on&– Less&security&– More&performance&interference&

•  The&trick&&–  Isolate&namespaces&– Share&resources&– Manage&performance&interference&

&&

Page 10: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Mul)plexing:&The&Good&News…&•  Data&from&a&sta)c&data&center&hos)ng&business&•  Several&customers&

•  Massive&overBprovisioning&•  Large&opportunity&to&increase&efficiency&•  How&do&we&get&there?&

•  CPU&usage&is&too&elas)c…&•  Median&life)me&<&10min&•  What&does&this&imply&for&

VM&lifecycle&opera)ons?&

Mul)plexing:&The&Bad&News…&

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

0 200 400 600 800

1000 1200 1400 1600 1800 2000

0 10 20 30 40 50 60

Freq

uenc

y

VM Lifetime (min)

•  But&memory&is&not…&•  <&2x&of&peak&usage&

0 1 2 3 4 5 6 7 8 9

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31

Mem

ory

Days

The&Elas)city&Challenge&

•  Make&efficient&use&of&memory&– Memory&oversubscrip)on&– DeBduplica)on&

•  Make&VM&instan)a)on&fast&and&cheap&– VM&granularity&– Cached&resume/cloning&

•  Allow&dynamic&realloca)on&of&resources&– VM&migra)on&and&resizing&– Efficient&binBpacking&

&

How&do&VMs&Isolate&Memory?&Shadow&Page&Tables:&another&level&of&indirec)on&

Physical&Address&

1&

2&

Process&2&

a&

b&

c&5&

FREE&

4&

1&

3&

Process&1&

a&

b&

c&

Page&Tables&(virtual&to&physical)&

VM&

Physical&Address&

1&

2&

5&

4&

1&

3&

Machine&Address&

100&

200&

500&

400&

300&

Hypervisor&

Machine&Address&

1&

2&Process&2&

c&

Process&1&

a&

Physical&to&Machine&map&

Shadow&page&tables&

CPU&

+&

Page 11: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Memory&Oversubscrip)on&•  Populate&on&demand:&only&works&one&way&•  Hypervisor&paging&

–  To&disk:&IOBbound&–  Network&memory:&Overdriver&[VEE’11]&

•  Ballooning&[Waldspurger’02]&&

–  Respect&guest&OS&paging&policies&–  Allocates&memory&to&free&memory&–  When&to&stop?&Handle&with&care&

VM&

Guest&OS&

Balloon&&driver&

VMM&VM&

Guest&OS&

Balloon&&driver&

Release&pages&to&VMM&&

OS&&paging&

Infla)ng&the&Balloon&

Allocate&pinned&&pages&

Memory&Consolida)on&•  Trade&computa)on&for&memory&

&

&

•  Memory&Buddies&[VEE’09]&–  Bloom&filters&to&compare&crossBmachine&similarity&and&find&migra)on&targets&

Physical&RAM&

FREE&

D&

FREE&

VM&1&Page&Table&

A&

B&

C&B&

C&

A&

A&

D&

B&

VM&2&Page&Table&

A&

D&

B&

Page&Sharing&[OSDI’02]&• &VMM&fingerprints&pages&• &Maps&matching&pages&COW&• &33%&savings&&

Difference&Engine&[OSDI’08]&• &Iden)fy&similar&pages&• &Delta&compression&• Up&to&75%&savings&

VMM&P2M&&Map&

Physical&RAM&

FREE&

D&

FREE&

VM&1&Page&Table&

A&

B&

C&B&

C&

A&

A&

D&

B&

VM&2&Page&Table&

A&

D&

B&

VMM&P2M&&Map&

PageBgranular&VMs&•  Cloning&

–  Logical&replicas&–  State&copied&on&demand&–  Allocated&on&demand&

•  Fast&VM&Instan)a)on&

VM Descriptor VM Descriptor VM Descriptor

Parent VM: Disk, OS,

Processes

Metadata,&Page&tables,&GDT,&vcpu&~1MB&for&1GB&VM&

Clone&Private&State&

OnBdemand&fetches&

Fast&VM&Instan)a)on?&&

•  A&full&VM&is,&well,&full&…&and&big&•  Spin&up&new&VMs&

– Swap&in&VM&(IOBbound&&copy)&– Boot&

•  80&seconds&!&220&seconds&!&10&minutes&

Page 12: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Clone&Time&

0&100&200&300&400&500&600&700&800&900&

2& 4& 8& 16& 32&

Devices&Spawn&Mul)cast&Start&Clones&Xend&Descriptor&

Clones&

Millise

cond

s&

Scalable&Cloning:&Roughly&Constant&

Memory&Coloring&

•  Introspective coloring –  code/data/process/kernel

•  Different policy by region –  Prefetch, page sharing

•  Network demand fetch has poor performance

•  Prefetch!? •  Semantically related regions

are interwoven

Clone&Memory&Footprints&•  For&scien)fic&compu)ng&jobs&(compute)&

–  99.9%&footprint&reduc)on&(40MB&instead&of&32GB)&

•  For&server&workloads&– More&modest&–  0%B60%&reduc)on&

Transient&VMs&improve&efficiency&of&approach&

Implica)ons&for&Data&Centers&

vs.&Today’s&clouds&•  30%&smaller&datacenters&possible&

•  With&beHer&QoS&– 98%&fewer&overloads&

35

45

55

65

75

85

0 5 10 20 30

Phys

ical

Mac

hine

s

% Memory Pages Shareable

Status Quo

Kaleidoscope

Page 13: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

� �������������������� ������������

•  Monitor:&–  demand,&u)liza)on,&performance&

•  Decide:&–  Are&there&any&boHlenecks?&– Who&is&affected?&–  How&much&more&do&they&need?&

•  Act:&–  Adjust&VM&sizes&– Migrate&VMs&–  Add/remove&VM&replicas&&–  Add/remove&capacity&&

Dynamic&Resource&Realloca)on&

������� ���������

Blackbox&Techniques&•  Hotspot&Detec)on&[NSDI’07]&

–  Applica)on&agnos)c&profiles&–  CPU,&network,&disk&–&can&monitor&in&VMM&– Migrate&VM&when&high&u)liza)on&–  e.g.,&Volume&=&1/(1BCPU)*1/(1BNet)*1/(1BDisk)&–  Pick&migra)ons&to&maximize&volume&per&byte&moved&

•  Drawbacks&– What&is&a&good&high&u)liza)on&watermark?&–  Detect&problems&only&aler&they’ve&happened&–  No&predic)ve&capability&–&how&much&more&is&needed?&–  Dependencies&between&VMs?&

&

Frac)o

n&of&2

nd&M

ost&

Popu

lar&T

ransac)o

n&

Frac)on&of&Most&Popular&Transac)on&

Up&the&Stack:&Graybox&Techniques&•  Queuing&models&•  Response&)me&&•  Predic)ve&•  Dependencies&

•  Learn&models&on&the&fly&–  Exploit&nonBsta)onarity&– Online&regression&[NSDI’07]&– Graybox&

Apache Server 0.5Tomcat Server

MySQL ServerTomcat ServerNet

CPU

VMM

Apache

DiskDisksdisk

sapache

sint

1 10.5

1

ndisk

ntomcat

Net

CPU

VMM

Tomcat

DiskDisksdisk

stomcat

sint

1

1

ndisk

ntomcat

Net

CPU

VMM

MySQL

DiskDisksdisk

stomcat

sint

1

1

ndisk

1Client

LD_PRELOAD Instrumentation

Servlet.jar InstrumentationNetwork Ping Measurement

•  Different&ac)ons,&costs,&outcomes&•  Change&VM&alloca)ons&•  VM&migra)ons,&add/remove&VM&clones&•  Add&or&remove&physical&capacity&

Compara)ve&Analysis&of&Ac)ons&

52&

Response&)me&Penalty&

0

100

200

300

400

500

600

700

800

100 200 300 400 500 600 700 800

Number of concurrent sessions

Del

ta r

es. t

ime

(ms)

8

9

10

11

12

13

14

15

16

17

100 200 300 400 500 600 700 800

Del

ta W

att (

%)

Number of concurrent sessions

Energy&Penalty&

Page 14: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Ac)ng&to&Balance&Cost&vs.&Benefit&

Time+

• +Adapta6on+costs+are+immediate,+benefits+accrued+over+6me++• +Pick+ac6ons+to+maximize+benefit+aSer+recouping+costs+

adapta)on&&completed&

adapta)on&&starts&

known&adapta)on&dura)on&

unknown&window&W&of&benefit&accrual&(forecas)ng)&

)me&to&recoup&costs&

U = (W - � dak) � (ΔPerf+ΔResources) −� (dak � Perfa+Resources) ak�A s�S ak�A s�S

Benefit& Adapta)on&Cost&&

Conjoint&Sequen)al&Op)miza)on&Perf.&Model Pwr.&Model Reconf.&Model

Adapt.&Ac)on&

Ac)ve&Hosts&

DomainB0

Hypervisor

Web

&Server

App.&Server

DB&Server

VM VM VM

DB&Server

DB&Server

App.&Server

DomainB0

Hypervisor VM VM VM Storage&

OS&&Image

Infrastructure Demand&

Controller

cnew1 cnew2 cnew3 ……. cnewn

cmax

Current&config&

cnew1 cnew2 cnew3 ……. cnewn

……

Ideal&configura)on&

Stop&reconf.&(benefit)&

Final&reconf.&

• Adjust&VM&quotas&• Add&VM&replicas&• Remove&VM&replicas&&• Migrate&VMs&• Remove&capacity&• Add&capacity&

Op)mize&performance,&infrastructure&use,&adapta)on&penal)es&

Let’s&talk&about&failures& Assume&Anything&can&Fail&•  But&can&it&fail&all&at&once?&

– How&to&avoid&single&failure&points?&•  EC2&availability&zones&

–  Independent&DCs,&close&proximity&– March&outage&was&across&zones&– EBS&control&plane&dependency&across&zones&– Ease&of&use/efficiency/independence&tradeoff&

•  What&about&racks,&switches,&power&circuits?&– FineBgrained&availability&control&– Without&exposing&proprietary&informa)on?&

WAN

Data Center Level Resource Tier

L(DataCenter)=100ms MTBF(DataCenter)=5yrs

Rack Level Resource Tier L(Rack)=5ms

MTBF(Rack)=1year

Host Level Resource Tier

L(Host)=10usec MTBF(Host)=1month

Page 15: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Peeking&over&the&Wall&

•  Users&provide&VMBlevel&HA&groups&[DCDV’11]&– Applica)onBlevel&constraints&–  e.g.,&primary&and&backup&VMs&–  Provider&places&HA&group&to&avoid&common&risk&factors&

•  Users&provide&desired&MTBF&for&HA&groups&[DSN’10]&–  Providers&use&infrastructure&dependencies&and&MTBF&values&to&guide&placement&

– Op)miza)on&problem:&capacity,&availability,&performance&

Data&Center&Diagnosis&•  Whose&problem&is&it?&

– Applica)on?&Host?&Network?&•  Who&detects&it?&

–  Cloud&users&don’t&know&topology&–  Providers&don’t&know&applica)ons&

Logical DAC Manager

58&Lightweight,&applica)on&independent&monitors&[NSDI’11]&

Network&Security&

•  Every&VM&gets&private/public&IP&•  VMs&can&choose&access&policy&by&IP/groups&•  IP&firewalls&ensure&isola)on&•  Good&enough?&

Informa)on&Leakage&&

•  Is&your&target&on&in&a&cloud?&–  Traceroute&– Network&triangula)on&

•  Are&you&on&the&same&machine?&–  IP&addresses&–  Latency&checks&–  Side&channels&(cache&interference)&

•  Can&you&get&on&the&same&machine?&–  PigeonBhole&principle&–  Placement&locality&

Page 16: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Network&Security&Evolved&

•  Remove&external&addressability&

•  Doesn’t&protect&external&facing&assets&

&

•  Virtual&private&clouds&– Amazon,&AT&T,&Verizon&– MPLS&VPN&connec)on&to&cloud&gateway&–  Internal&VLANs&within&cloud&– Virtual&gateways,&firewalls&

Source:&Amazon&AWS&

Security:&Trusted&Compu)ng&Bases&

•  Isola)on&is&the&fundamental&property&of&IaaS&•  That’s&why&we&have&VMs&…&and&not&a&cloud&OS&•  Narrower&interfaces&•  Smaller&TCBs&•  Really?&

The&Xen&TCB&

Hypervisor&Domain0&•  Linux&Kernel&•  Linux&distribu)on&

–  Network&services&–  Shell&

•  Control&stack&•  VM&mgmt&tools&

–  BootBloader&–  Checkpoin)ng&

Smaller&TCBs&

•  Dom0&disaggrega)on,&Nova&•  No&TCB?&Homomorphic&encryp)on!&

Page 17: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Remember&

•  Moving&up&the&stack&helps&– Mul)plexing&– Resource&alloca)on&– Design&for&availability&– Diagnosability&

•  Moving&down&the&stack&helps&– Security&– Privacy&

Learn&From&a&Use&Case:&Neilix&

•  Transcoding&Farm&•  It&does&not&hold&customer&sensi)ve&data&•  It&has&a&clean&failure&model:&restart&•  You&can&horizontally&scale&this&at&will&

Learn&From&a&Use&Case:&Neilix&

•  Search&Engine&•  It&does&not&hold&customer&sensi)ve&data&•  It&has&a&clean&failure&model:&no&updates&•  You&can&horizontally&scale&this&at&will&•  It&can&tolerate&eventual&consistency&

Learn&From&a&Use&Case:&Neilix&

•  Recommenda)on&Engine&•  It&does&not&hold&customer&sensi)ve&data&•  It&has&a&clean&failure&model:&global&index&•  You&can&horizontally&scale&this&at&will&•  It&can&tolerate&eventual&consistency&

Page 18: Tutorial?&web.eecs.umich.edu/~sugih/courses/eecs589/papers/Joshi+Lagar-Cavilla11.pdf · Cloud&Compu)ng:&Recent&Trends,& Challenges&and&Open&Problems& Kaustubh&Joshi,&H.&Andrés&LagarBCavilla&

Learn&From&a&Use&Case:&Neilix&

•  “Learn&with&real&scale,&not&toy&models”&– Why&not?&It&costs&you&ten&bucks&

•  Chaos&Monkey&– Why&not?&Things&will&fail&eventually&

•  Nothing&is&fast,&everything&is&independent&

Source:&Voas,&Jeffrey;&&Zhang,&Jia.&Cloud&Compu)ng:&New&Wine&or&Just&a&New&BoHle?&In&IT&Professional,&March&2009,&Volume&11,&Issue&2,&pp&15B17.

The&circle&is&now&complete…&

…or&is&it?&

Ques)ons?&

• &Tradeoffs&driven&by&applica)on&rather&than&technology&needs&

• &Scale,&global&reach&

• &Mobility&of&users,&servers&

• &Increasing&democra)za)on&


Recommended