+ All Categories
Home > Documents > MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2: A Middleware for Processing Data with Amazon Web Services

Date post: 02-Feb-2016
Category:
Upload: donnel
View: 34 times
Download: 0 times
Share this document with a friend
Description:
MATE-EC2: A Middleware for Processing Data with Amazon Web Services. Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering and Computer Science (Presenter) Washington State University, Vancouver. - PowerPoint PPT Presentation
32
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering Ohio State University * School of Engineering and Computer Science (Presenter) Washington State University, Vancouver The 4th Workshop on Many-Task Computing on Grids and Supercomputers: MTAGS’11
Transcript
Page 1: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

MATE-EC2:A Middleware for Processing Data with

Amazon Web Services

Tekin Bicer David Chiu* and Gagan Agrawal

Department of Compute Science and Engineering Ohio State University

* School of Engineering and Computer Science (Presenter) Washington State University, Vancouver

The 4th Workshop on Many-Task Computing onGrids and Supercomputers: MTAGS’11

Page 2: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

3T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

• Today’s applications are increasingly data and compute intensive

• Many-Task Computing paradigms becoming pervasive: WF, MR

• E.g., Map-Reducible applications are solving common problems– Data mining– Graph processing– etc.

The Problem

3

Page 3: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

4T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

• Infrastructure-as-a-Service– Anyone, anywhere can

allocate “unlimited” virtualized compute/storage resources

• Amazon Web Services:– Most popular IaaS provider– Elastic Compute Cloud (EC2)– Simple Storage Service (S3)

Clouds

4

Page 4: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

5T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Amazon Web Services: EC2

• On-Demand Instances (Virtual Machines)– Types: Extra Large, Large, Small, etc.

• For example, Extra Large Instance:– Cost: $0.68 per allocated-hour– 15GB memory; 1.7TB disk (ephemeral)– 8 Compute Units– High I/O performance

5

Page 5: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

6T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Amazon Web Services: S3

• Accessible anywhere. High reliability and availability

• Objects are arbitrary data blobs• Objects stored as (key,value) in Buckets

– 5TB of data per object– Unlimited objects per bucket

6

Page 6: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

7T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Amazon Web Services: S3 (Cont.)

• Simple FTP-like interface using web service protocol– Put, Get (Partial Get), and Delete– SOAP and REST

• High throughput (~40MB/sec)– Scales well to multiple clients

• Low costs

7

Page 7: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

8T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Amazon Web Services: S3 (Cont.)

• 449 billion objects in S3 as of July 2011– Doubling each year

8

Page 8: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

9T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Motivation and Focus

9

• Virtualization is characteristic of any cloud environment: Clouds are black boxes

• Storage and elastic compute services exhibit performance variabilities we should leverage

Page 9: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

10

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Goals

• As users are increasingly moving to cloud-based solutions for computing....

• We have a need for services and tools that can…– Get the most out of cloud resources for data-

intensive processing– Provide a simple programming interface

10

?

Page 10: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

11

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Outline

• Background• System Overview• Experiments• Conclusion

11

Page 11: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

12

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

MATE-EC2 System Design

• Cloud middleware that is able to– Use a set of possibly heterogeneous EC2 instances

to scalably and efficiently process data stored in S3

• MATE is a generalized reduction PDC structure like Map-Reduce

12

Page 12: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

13

MATE

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

MATE and Map-Reduce

13

Map-Reduce

Page 13: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

14

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

MATE-EC2 Design

14

Page 14: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

15

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

MATE-EC2 Design: Data Organization

15

Objects: Physical representation of the data in S3

Units: Fixed data units within a chunk for retrieval(exploits concurrency)

Metadata: chunk offset, chunk size, unit size

Chunks: Logical data partitions within objects (exploits memory utilization)

Page 15: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

16

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

MATE-EC2 Design: Data Retrieval

16

Threaded chunk retrieval: Chunk retrieved concurrently with a number of threads

Page 16: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

17

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Dynamic Load Balancing

17

Job Pool

Page 17: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

18

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Dynamic Load Balancing

18

Job Pool

Schedule for processing...

...Simple greedy heuristic:

Select a unit belonging tothe least connected chunk

Page 18: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 19

MATE-EC2 Processing Flow

(1) Compute node requests a job from Master

Page 19: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 20

MATE-EC2 Processing Flow

(2) Chunk retrieved in units

Page 20: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 21

MATE-EC2 Processing Flow

(3) Pass to Compute Layer, and process

Page 21: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 22

MATE-EC2 Processing Flow

(4) Request another job from Master

Page 22: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA) 23

MATE-EC2 Processing Flow

Slave Instance

Slave InstanceSlave Instance

...

Page 23: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

24

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Experiments

• Goals– Finding the most suitable setting for AWS

– Performance of MATE-EC2 on heterogeneous and homogeneous compute environments

– Performance comparison of MATE-EC2 and Map-Reduce

24

Page 24: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

25

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Experiments (Cont.)

• Setup:– 4 Large EC2 slave instances– 1 Large instance for master instance– For each application, the dataset is split into16 data

objects on S3

• Large Instance:– 4 compute units (each comparable to 1.0-1.2GHz)– 7.5GB (memory)– 850GB (disk, ephemeral)– High I/O

25

Page 25: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

26

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Experiments (Cont.)

26

App I/O Comp RObj Size Dataset

KMeans Clustering

Low/Med Med/High Small 8.2GB10.7 billion points

PCA Low High Large 8.2GB

PageRank High Low/Med Very Large 1GB9.6M nodes, 131M edges

Page 26: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

27

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Effect of Chunk Sizes (KMeans)

• Performance increase:– 128KB vs. >8M– 2.07x to 2.49x

speedup

27

1 Data Retrieval Thread

Page 27: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

28

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Data Retrieval (KMeans)

• 8M vs. others speedup: 1.13x - 1.30x

28

16 Data Retrieval Threads

• One Thread vs. others:

1.37x - 1.90x

128M Chunk Size

Page 28: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

29

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Job Assignment (KMeans)

• Speedup:

– 1.01x for 8M

– 1.1x to 1.14x for others

29

Page 29: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

30

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

MATE-EC2 vs. Elastic MapReduce

30

Speedups vs. EMR-combine4.08x to 7.54x

PageRank

Speedups vs. EMR-combine3.54x to 4.58x

KMeans

Chunk Size: 128MBData retrieval Threads: 16

Page 30: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

31

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

MATE-EC2 on Heterogeneous Instances

• Overheads– KMeans: 1%– PCA: 1.1%, 7.4%, 11.7%

31

Page 31: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

32

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

In Conclusion...

• AWS environment is explored for data-intensive computing– 64M and 128M data chunks w/ 16 data retrieval

threads seems to be optimal for our middleware

• Our data retrieval and processing optimizations significantly improve the performance of our middleware

• MATE-EC2 outperforms MR both in scalability and performance

32

Page 32: MATE-EC2: A Middleware for Processing Data with Amazon Web Services

33

T Bicer, D Chiu, G Agrawal. Mate-EC2: A Middleware for Processing Data with AWS (MTAGS 2011, Seattle WA)

Thank You

• Questions & Discussion

• Acknowledgments:– We want to thank the MTAGS’11 organizers and the

reviewers for their constructive comments

– This work was sponsored by:

33

?


Recommended