+ All Categories
Home > Documents > Thus Far

Thus Far

Date post: 25-Feb-2016
Category:
Upload: shaman
View: 41 times
Download: 0 times
Share this document with a friend
Description:
Thus Far. Locality is important!!! Need to get Processing closer to storage Need to get tasks close to data Rack locality: Hadoop Kill task and if a local slot is available: Quincy Why? Network is bad: gives horrible performance Why? Over-subscription of network. What Has Changed?. - PowerPoint PPT Presentation
Popular Tags:
36
Transcript
Page 1: Thus Far
Page 2: Thus Far

Thus Far

• Locality is important!!!– Need to get Processing closer to storage– Need to get tasks close to data

• Rack locality: Hadoop• Kill task and if a local slot is available: Quincy

• Why?– Network is bad: gives horrible performance– Why?

• Over-subscription of network

Page 3: Thus Far

What Has Changed?• Network is no-longer over-subscribed

– Fat-tree, VL2• Network has fewer congestion points

– Helios, C-thru, Hedera, MicroTE• Server uplinks are much faster• Implications: network transfers are much faster

– Network is now just as fast as Disk I/o– Difference between local and rack local is only 8%

• Storage practices have also changed– Compression is being used

• Smaller amount of data need to be transferred– De-replication is being practiced

• There’s only one copy so locality is really hard to achieve

Page 4: Thus Far

So What Now?

• No need to worry about locality when doing placement– Placement can happen faster– Scheduling algorithms can be smaller/simpler

• Network is as fast as SATA disk? But still a lot slower than SSD?– If SDD used then disk-locality is a problem AGAIN!– However too costly to be used for all storage

Page 5: Thus Far

Caching with Memory/SSD

• 94% of all jobs can have input fit in memory• So a new problem is memory locality – Want to place a task where it will have access to

data already in memory• Interesting challenges:– 46% of task use data that is never re-used• So need to pre-catch for these tasks

– Current caching scheme are ineffective

Page 6: Thus Far

How do you build a FS that Ignore locality

• FDS from MSR ignores locality

• Eliminate networking problem to remove importance of locality

• Eliminate meta-data server problems to improve throughput of the whole system

Page 7: Thus Far

Meta-data Server

• Current meta-data server (name-node)– Stores mapping of chunks to servers– Central point of failure– Central bottle-neck • Processing issues: before anyone reads/writes must

consult metadata server • Storage issues: must store location of EVERY chunk and

size of every chunk

Page 8: Thus Far

FDS’s Meta-data Server

– Only store list of servers: • smaller memory footprint: • # servers <<< # chunks

– Clients only interact with it at startup• Not every-time they need to read/write• To determine where to read/write: Consistent hashing

– Write/read data at server at this location in array» Hash(GUID)/#-server

• # reads/writes <<<< # client boot

Page 9: Thus Far

Network Changes

• Uses VL2 style Clos Network– Eliminates over-subscription+ congestion

• 1 TCP doesn’t saturate Server 10-gig NIC– Use 5 TCP connections to saturate link

• Since VL2, No congestion in core but maybe at receiver– Receiver controls the senders sending rate

• Receiver sends rate-limiting messages to

Page 10: Thus Far

Disk locality is almost a distant problem

• Advances in networking– Eliminate over-subscription/congestion

• We have prototype of FDS that doesn’t need locality– Uses VL2– Eliminates meta-data servers

• New problem, new challenges– Memory locality

• New cache replacement techniques• New pre-caching schemes

Page 11: Thus Far

Class Wrap-UP

• What have we covered and learned?• The big data-stack – How to optimize each layer?– What are the challenges in each layer?– Are there any opportunities to optimize across

layers?

Page 12: Thus Far

Big-Data Stack: App Paradigms

• Commodity devices impact the design of application paradigms– Hadoop: dealing with failures• Addresses n/w oversubscription—rack aware placement• Straggler detection mitigation --- restart tasks

– Dryad: hadoop for smarter programmers• Can create more expressive task DAGs (non cyclic)• Can determine which should run locally on same devs• Dryad does optimizations: adds extra nodes to do temp

aggregation

Page 13: Thus Far

Hadoop DryadApp

SDN

Storage

N/W Sharing

Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

Page 14: Thus Far

Big-Data Stack: App Paradigms Revisited

• User visible services are complex and composed of multiple M-R jobs– Flume & DryadLinQ• Delay Execution until output is required• Allows for various optimizations• Storing output to HDFS between M-R jobs adds times

– Eliminate HDFS between jobs• Programmers aren’t smart, often have extra un-

necessarily steps– Knowing what is required for output, you can eliminate

unnecessary

Page 15: Thus Far

Hadoop Dryad

FlumeJava DryadLinQApp

SDN

Storage

N/W Sharing

Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

Page 16: Thus Far

Big-Data Stack: App Paradigms Revisited-yet-again

• User visible services require interactivity: so jobs need to be fast. Jobs should return results before completing processing– Hadoop-Online:

• pipeline results from map to reduce before done. • Pipeline too early and reduce need to do sorting

– Increases processing overhead on reduce: BAD!!!– RRD: Spark

• Store data in memory: much faster than disk• Instead of doing process: create abstract graph of processing

and to processing when output is required– Allows for optimizations

• Failure recovery is the challenge

Page 17: Thus Far

Hadoop Dryad

FlumeJava DryadLinQ HadoopOnline Spark

MesosOmega Sharing

App

SDN

Storage

N/W Sharing

Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

Page 18: Thus Far

Big-Data Stack: Sharing in CaringHow to share a non-virtualized cluster

• Sharing is good: you have too much data and cost too much to build many cluster for same data

• Need dynamic sharing: if static, you can waste• Mesos:– Resource offers: give app options of resources and let them

pick– App knows best

• Omega:– Optimistic allocation: each scheduler picks resources and if

there’s a conflict omega detects this and gives resources to only one. Others pick new resources

– Even with conflicts this is much better than centralized entity

Page 19: Thus Far

Hadoop Dryad

FlumeJava DryadLinQ HadoopOnline Spark

MesosOmega Sharing

App

SDN

Storage

N/W Sharing

Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

Page 20: Thus Far

Big-Data Stack: Sharing in CaringCloud Sharing

• Clouds gives the illusion of equality– H/W differences diff performance– Poor isolation tenants can impact each other• I/O and CPU bound jobs can conflict.

Page 21: Thus Far

Hadoop Dryad

FlumeJava DryadLinQ HadoopOnline Spark

MesosOmega

BobTail RFA CloudGaming Virt Drawbacks

Sharing

App

SDN

Storage

N/W Sharing

Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

Page 22: Thus Far

Big-Data Stack: Better Networks

• Networks give bad performance– Cause: Congestion + over-subscription

• VL2/Portland– Eliminate over-subscription + congestion with

commodity devices+ECMP• Helios/C-through– Mitigate congestion by carefully adding new

capacity

Page 23: Thus Far

Hadoop Dryad

FlumeJava DryadLinQ HadoopOnline Spark

MesosOmega

BobTail RFA CloudGaming

VL2 Portland Helios C-Thru HederaMicroTE N/W Paradign

Virt Drawbacks

Sharing

App

SDN

Storage

N/W Sharing

Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

Page 24: Thus Far

Big-Data Stack: Better Networks

• When you need multiple servers to service a request– .99100 = .65 (HORRIBLE)– Duplicate requests: send same request to 2 servers• At-least one will finish within acceptable time

– Dolly: be smart when selecting the 2 servers• You don’t want I/O contention because that leads to bad perf• Avoid Maps using same replicas• Avoid Reducers reading same intermediate output

Page 25: Thus Far

Hadoop Dryad

FlumeJava DryadLinQ HadoopOnline Spark

MesosOmega

BobTail RFA CloudGaming

VL2 Portland Helios C-Thru HederaMicroTE

Dolly (Clones) MantreiTail Latency Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

App

Page 26: Thus Far

Big-Data Stack: Networks Sharing• How to share efficiently while making guarantees• Elastic-Switch– Two level bandwidth allocation system

• Orchestra– M/R has barriers and completion is based on a set of flows not

individual flows– Make optimization to a set of flows

• Hull: Trade BW for latency– Want zero buffering: but TCP needs buffering– Limit traffic to 90% of link and use the remaining 10% as buffers

Page 27: Thus Far

Hadoop Dryad

FlumeJava DryadLinQ HadoopOnline Spark

MesosOmega

BobTail RFA CloudGaming

VL2 Portland Helios C-Thru HederaMicroTE

Elastic Cloud

Dolly (Clones) Mantrei

Hull

Tail Latency

Orchestra N/W Sharing

Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

App

SDN

Storage

Page 28: Thus Far

Big-Data Stack: Enter SDN

• Remove the Control plane from the switches and centralize it

• Centralization == Scalability challenges• NOX: how does it scale to data-centers– How many controllers do you need?

• How should you design these controllers:– Kandoo: a hierarchy (many local and 1 global

controller, local communicate with the global;)– ONIX: a mesh (communication through a DHT or DB)

Page 29: Thus Far

Hadoop Dryad

FlumeJava DryadLinQ HadoopOnline Spark

MesosOmega

BobTail RFA CloudGaming

VL2 Portland Helios C-Thru HederaMicroTE

Elastic Cloud

SDNKandoo ONIX

Dolly (Clones) Mantrei

Hull

Tail Latency

Orchestra N/W Sharing

Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

App

Storage

Page 30: Thus Far

Big Data Stack: SDN+Big-Data

• FlowComb:– Detect app patterns and have SDN controller

assign paths based on knowledge of traffic patterns and contention

• Sinbad:– HDFS writes are important– Let SDN controller tell HDFS best place to write

data to based on knowledge of n/w congesetion

Page 31: Thus Far

Hadoop Dryad

FlumeJava DryadLinQ HadoopOnline Spark

MesosOmega

BobTail RFA CloudGaming

VL2 Portland Helios C-Thru HederaMicroTE

Elastic Cloud

SDNKandoo ONIX

FlowComb SinBaD

Dolly (Clones) Mantrei

Hull

Tail Latency

Orchestra N/W Sharing

Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

App

SDN

Storage

N/W Sharing

Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

App

Page 32: Thus Far

Big Data Stack: Distributed Storage

• Ideal: Nice API, low latency, scalable• Problem: H/W fails a lot, in limited locations,

and contains limited resources• Partition: gives good performance– Cassandra: use consistent hashing– Megastore: each partition == A RDBMS with good

consistency guarantees• Replicate: Multiple copies avoid failures– Megastore: replicas allow for low latency

Page 33: Thus Far

Hadoop Dryad

FlumeJava DryadLinQ HadoopOnline Spark

MesosOmega

BobTail RFA CloudGaming

VL2 Portland Helios C-Thru HederaMicroTE

Elastic Cloud

SDNKandoo ONIX

FlowComb SinBaD

Megastore Casandra

Dolly (Clones) Mantrei

Hull

Storage

Tail Latency

Orchestra N/W Sharing

Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

App

Page 34: Thus Far

Big Data Stack: Disk locality Irrelevant

• Disk Locality is becoming irrelevant– Data is getting smaller (compressed) so smaller

times– Networks are getting much faster (only 8% slower)

• Mem locality is new challenge– Input for 94% fit in mem– Need new caching+prefetching schemes

Page 35: Thus Far

Hadoop Dryad

FlumeJava DryadLinQ HadoopOnline Spark

MesosOmega

BobTail RFA CloudGaming

VL2 Portland Helios C-Thru HederaMicroTE

Elastic Cloud

SDNKandoo ONIX

FlowComb SinBaD

Megastore Casandra

FDS

Dolly (Clones) Mantrei

Hull

Storage

Tail Latency

Orchestra N/W Sharing

Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

App

Disk-locality irrelevant

Page 36: Thus Far

Hadoop Dryad

FlumeJava DryadLinQ HadoopOnline Spark

MesosOmega

BobTail RFA CloudGaming

VL2 Portland Helios C-Thru HederaMicroTE

Elastic Cloud

SDNKandoo ONIX

FlowComb SinBaD

Megastore Casandra

FDS

Dolly (Clones) Mantrei

Hull

Storage

Tail Latency

Orchestra N/W Sharing

Tail Latency

N/W Paradign

Virt Drawbacks

Sharing

App

Disk-locality irrelevant


Recommended