+ All Categories
Home > Technology > Big Data Use Cases and Solutions in the AWS Cloud

Big Data Use Cases and Solutions in the AWS Cloud

Date post: 08-Sep-2014
Category:
Upload: amazon-web-services
View: 838 times
Download: 3 times
Share this document with a friend
Description:
The AWS cloud computing platform has disrupted big data. Managing big data applications used to be for only well-funded research organizations and large corporations, but not any longer. Hear from Ben Butler, Big Data Solutions Marketing Manager for AWS, to learn how our customers are using big data services in the AWS cloud to innovate faster than ever before. Not only is AWS technology available to everyone, but it is self-service, on-demand, and featuring innovative technology and flexible pricing models at low cost with no commitments. Learn from customer success stories, as Ben shares real-world case studies describing the specific big data challenges being solved on AWS. We will conclude with a discussion around the tutorials, public datasets, test drives, and our grants program - all of the resources needed to get you started quickly.
Popular Tags:
85
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc. Big Data Use Cases and Solutions in the AWS Cloud Ben Butler, @bensbutler, Sr. Mgr., Big Data & HPC July 10, 2014
Transcript
Page 1: Big Data Use Cases and Solutions in the AWS Cloud

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Big Data Use Cases and

Solutions in the AWS Cloud

Ben Butler, @bensbutler, Sr. Mgr., Big Data & HPC

July 10, 2014

Page 2: Big Data Use Cases and Solutions in the AWS Cloud

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 3: Big Data Use Cases and Solutions in the AWS Cloud

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 4: Big Data Use Cases and Solutions in the AWS Cloud

Big Data: Unconstrained data growth

95% of the 1.2 zettabytes

of data in the digital

universe is unstructured

70% of of this is user-

generated content

Unstructured data growth

explosive, with estimates

of compound annual

growth (CAGR) at 62%

Source: IDCGB TB

PB

ZB

EB

Page 5: Big Data Use Cases and Solutions in the AWS Cloud

The amount of information generated during the first day of

a baby’s life today is equivalent to 70 times the information

contained in the Library of Congress

Page 6: Big Data Use Cases and Solutions in the AWS Cloud

Lower cost,

higher throughput Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 7: Big Data Use Cases and Solutions in the AWS Cloud

Highly

constrained

Lower cost,

higher throughput Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 8: Big Data Use Cases and Solutions in the AWS Cloud

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011

IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

Available for analysis

Generated data

Data volume - Gap

1990 2000 2010 2020

Page 9: Big Data Use Cases and Solutions in the AWS Cloud

Elastic and highly scalable

No upfront capital expense

Only pay for what you use+

+

Available on-demand+

=

Remove constraints

Page 10: Big Data Use Cases and Solutions in the AWS Cloud

Accelerated

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 11: Big Data Use Cases and Solutions in the AWS Cloud

Technologies and techniques for working

productively with data, at any scale.

Big Data

Page 12: Big Data Use Cases and Solutions in the AWS Cloud

Big data and AWS Cloud computing

Big data Cloud computing

Variety, volume, and velocity

requiring new tools

Variety of compute, storage,

and networking options

Page 13: Big Data Use Cases and Solutions in the AWS Cloud

Big data and AWS Cloud computing

Big data Cloud computing

Potentially massive datasets Massive, virtually unlimited

capacity

Page 14: Big Data Use Cases and Solutions in the AWS Cloud

Big data and AWS Cloud computing

Big data Cloud computing

Iterative, experimental style of

data manipulation and analysis

Iterative, experimental style of

infrastructure deployment/usage

Page 15: Big Data Use Cases and Solutions in the AWS Cloud

Big data and AWS Cloud computing

Big data Cloud computing

Frequently not steady-state

workload; peaks and valleys

At its most efficient with highly

variable workloads

Page 16: Big Data Use Cases and Solutions in the AWS Cloud

Big data and AWS Cloud computing

Big data Cloud computing

Absolute performance not as

critical as “time to results”;

shared resources are a

bottleneck

Parallel compute projects allow

each workgroup to have more

autonomy, get faster results

Page 17: Big Data Use Cases and Solutions in the AWS Cloud

One tool to

rule them all

Page 18: Big Data Use Cases and Solutions in the AWS Cloud

Use the right tools

Amazon

S3

Amazon

Kinesis

Amazon

DynamoDB

Amazon

RedshiftAmazon

Elastic

MapReduce

Page 19: Big Data Use Cases and Solutions in the AWS Cloud

Store anything

Object storage

Scalable

99.999999999% durability

Amazon

S3

Page 20: Big Data Use Cases and Solutions in the AWS Cloud

Real-time processing

High throughput; elastic

Easy to use

EMR, S3, Redshift, DynamoDB

Integrations

Amazon

Kinesis

Page 21: Big Data Use Cases and Solutions in the AWS Cloud

NoSQL Database

Seamless scalability

Zero admin

Single digit millisecond latency

Amazon

DynamoDB

Page 22: Big Data Use Cases and Solutions in the AWS Cloud

Relational data warehouse

Massively parallel

Petabyte scale

Fully managed

$1,000/TB/Year

Amazon

Redshift

Page 23: Big Data Use Cases and Solutions in the AWS Cloud

Try Amazon Redshift with BI & ETL for Free!

aws.amazon.com/redshift/free-trial

2 months | 750 hours/month | dw2.large SSD instance

160GB of compressed storage per node

Try BI & ETL for free from nine partners at

aws.amazon.com/redshift/partners

Page 24: Big Data Use Cases and Solutions in the AWS Cloud

Hadoop/HDFS clusters

Hive, Pig, Impala, Hbase

Easy to use; fully managed

On-demand and spot pricing

Tight integration with S3,

DynamoDB, and Kinesis

Amazon

Elastic

MapReduce

Page 25: Big Data Use Cases and Solutions in the AWS Cloud

Amazon EMR now ships with ODBC and JDBC drivers for

Hive, Impala, and HBase

Easier to use popular BI tools like:

Microsoft Excel, Tableau, MicroStrategy, and QlikView

ODBC and JDBC drivers now for Amazon EMR

Page 26: Big Data Use Cases and Solutions in the AWS Cloud

The right tools.

At the right scale.

At the right time.

Page 27: Big Data Use Cases and Solutions in the AWS Cloud

HDFS

Amazon EMR

Page 28: Big Data Use Cases and Solutions in the AWS Cloud

HDFS

Amazon S3 Amazon

DynamoDB

Amazon EMR

AWS Data Pipeline

Page 29: Big Data Use Cases and Solutions in the AWS Cloud

HDFS

Amazon S3 Amazon

DynamoDB

Amazon EMR

Amazon

Kinesis

AWS Data Pipeline

Data

Sources

Page 30: Big Data Use Cases and Solutions in the AWS Cloud

HDFS

Amazon S3 Amazon

DynamoDB

Amazon EMR

Amazon

Kinesis

AWS Data Pipeline

Data

Sources

Data management Hadoop Ecosystem analytical tools

Page 31: Big Data Use Cases and Solutions in the AWS Cloud

HDFS

Amazon

RedShift

Amazon

RDS

Amazon S3 Amazon

DynamoDB

Amazon EMR

Amazon

Kinesis

AWS Data Pipeline

Data management Hadoop Ecosystem analytical tools

Data

Sources

Page 32: Big Data Use Cases and Solutions in the AWS Cloud

HDFS

Amazon

RedShift

Amazon

RDS

Amazon S3 Amazon

DynamoDB

Amazon EMR

Amazon

Kinesis

AWS Data Pipeline

Data management Hadoop Ecosystem analytical tools

Data

Sources

AWS Data

Pipeline

Page 33: Big Data Use Cases and Solutions in the AWS Cloud

Free steak campaign

Disaster recovery

Web site & media sharing

Facebook app

Ground campaign

SAP & SharePoint

Marketing web site

Business line of sight

Consumer social app

IT operations

Mars exploration ops

Interactive TV apps

Media streaming

Consumer social app

Facebook page

Securities Trading Data Archiving

Financial markets analytics

Web and mobile apps

Big data analytics

Digital media

Ticket pricing optimization

Streaming webcasts

Mobile analytics

Consumer social app

Core IT and media

Page 34: Big Data Use Cases and Solutions in the AWS Cloud

Customer Use Cases of Big Data

Page 35: Big Data Use Cases and Solutions in the AWS Cloud
Page 36: Big Data Use Cases and Solutions in the AWS Cloud

Dropcam is the biggest inbound video service

on the Web

More data uploaded per

minute than YouTube

Petabytes of data

processed every month

Billions of motion events

detected

Page 37: Big Data Use Cases and Solutions in the AWS Cloud
Page 38: Big Data Use Cases and Solutions in the AWS Cloud

4 months to production

300% speed gain

$500k - $1M in CAPEX saved

Page 39: Big Data Use Cases and Solutions in the AWS Cloud
Page 40: Big Data Use Cases and Solutions in the AWS Cloud
Page 41: Big Data Use Cases and Solutions in the AWS Cloud
Page 42: Big Data Use Cases and Solutions in the AWS Cloud
Page 43: Big Data Use Cases and Solutions in the AWS Cloud
Page 44: Big Data Use Cases and Solutions in the AWS Cloud
Page 45: Big Data Use Cases and Solutions in the AWS Cloud

500MM tweets/day = ~ 20.8MM tweets/hr

2k/tweet is ~12MB/sec, need 6 shards, ~1TB/day

$0.015/hour per shard, $0.028/million PUTS

Kinesis cost is $0.765/hour

Redshift cost is $0.850/hour (for a 2TB dw1.xlarge)

Total: $1.615/hour

Cost &

Scale

Page 46: Big Data Use Cases and Solutions in the AWS Cloud

http://wefeel.csiro.au/#/

Page 47: Big Data Use Cases and Solutions in the AWS Cloud

“THANKS TO AMAZON WEB SERVICES, WE CAN DELIGHT OUR PLAYERS WORLDWIDE.”

Sami Yliharju | Services Lead

Page 48: Big Data Use Cases and Solutions in the AWS Cloud
Page 49: Big Data Use Cases and Solutions in the AWS Cloud

The Climate Corporation - Weather Insurance for Farms

Challenge:Volatile weather is deadly to crops like grapes

Solution:

Built a predictive model based on freely available

data:

• 60 years of crop data,

• 14 TBs of soil data, and

• 1M government Doppler radar points

• 50 EMR clusters process new data as it comes

into S3 each day, continuously updating the

model.

Page 50: Big Data Use Cases and Solutions in the AWS Cloud

150B Soil

Observations

3M Daily Weather

Measurements

850K Precision Rainfall

Grids Tracked

200 TB in Amazon S3

Page 51: Big Data Use Cases and Solutions in the AWS Cloud

Foursquare…

33 million users1.3 million businesses

…generates a lot of Data3.5 billion check-ins 15M+ venues, Terabytes of log data

Page 52: Big Data Use Cases and Solutions in the AWS Cloud

Uses EMR for

Evaluation of new features

Machine learning

Exploratory analysis

Daily customer usage reporting

Long-term trend analysis

Page 53: Big Data Use Cases and Solutions in the AWS Cloud

Benefits of Amazon EMR

Ease-of-Use“We have decreased the processing time for urgent data-analysis”

FlexibilityTo deal with changing requirements & dynamically expand reporting clusters

Costs“We have reduced our analytics costs by over 50%”

Page 54: Big Data Use Cases and Solutions in the AWS Cloud

Who is checking in?

0

0.1

0.2

0.3

0.4

0.5

0.6

Female Male

Gender

0 20 40 60 80

Age

Page 55: Big Data Use Cases and Solutions in the AWS Cloud

Gorilla Coffee

Gray's Papaya

Amorino

Thursday Friday Saturday Sunday

When do people go to a place?

Page 56: Big Data Use Cases and Solutions in the AWS Cloud

User Sign-ups

Page 57: Big Data Use Cases and Solutions in the AWS Cloud

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 58: Big Data Use Cases and Solutions in the AWS Cloud

a

AmazonDynamoDB

Amazon

RDS

Amazon

Redshift

AWS

Direct Connect

AWS

Storage Gateway

AWS

Import/ Export

Amazon

GlacierS3

Amazon

KinesisAmazon EMR

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 59: Big Data Use Cases and Solutions in the AWS Cloud

Amazon EC2 Amazon EMRAmazon

Kinesis

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 60: Big Data Use Cases and Solutions in the AWS Cloud

AmazonRedshift

AmazonDynamoDB

Amazon

RDS

S3 Amazon EC2 Amazon EMR

Amazon

CloudFront

AWS

CloudFormation

AWS

Data Pipeline

Generation

Collection & storage

Analytics & computation

Collaboration & sharing

Page 61: Big Data Use Cases and Solutions in the AWS Cloud

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

DataXu in the Cloud

Yekesa Kosuru, V.P Technology

July 10th 2014

Page 62: Big Data Use Cases and Solutions in the AWS Cloud

What is DataXu?

• Digital Marketing Platform, Ad Tech Platform

• Real-time Multivariate Decision System

• 5th Fastest Growing Private Company in U.S (Inc 500)

• Optimize Digital Marketing Campaigns– ...put the right ad campaign in front of the right customer

– …find customer who left their site without converting

– …find more customers who are likely to convert

– …offer insight into who, why, when, where are respondents

• 950,000 times per second

Page 63: Big Data Use Cases and Solutions in the AWS Cloud

Big Data, Little Decisions

Decision

impact(also proportional

to risk)

Decision rate

1

2000’s – “How often can we run a permission-based email mktg. campaign?” Rules-based alerts

2010’s – Millions of decisions and actions taken, all in less than a blink of an eye

volume ~ value

The Evolution of Real-Time Decision Systems

1

2

2

3

3

1990’s – “Should we advertise on the Superbowl? Should we run direct mail this qtr.?” Batch mode

Page 64: Big Data Use Cases and Solutions in the AWS Cloud

Real Time Bidding

Site

Auctions

Ads, e.g

Google

User

Opens

Browser

Goes to

Sports Site

DataXu

Bids(others bid too)

DataXu

Wins Bid

Ad Shown,

Page loads

Page 65: Big Data Use Cases and Solutions in the AWS Cloud

Quick Statistics

• 950K bid requests per second

• Billions of impressions per month, Petabyte of

data

• 100 ms round trip response time

• 100+TB of warehouse data

• 3000+ Servers powering the platform

Page 66: Big Data Use Cases and Solutions in the AWS Cloud

Why AWS

• Automation, API

• Costs, Pay As You Go

• Auto Scaling (elasticity – up and down)

• All Data in One Place (S3 foundational store)

• Improved Testability

• Security, Privacy

• Disaster Recovery and Business Continuity

Page 67: Big Data Use Cases and Solutions in the AWS Cloud

DataXu StackCampaign

Management

Business Intelligence

Data Mart

Interactive

Queries

Batch

Queries

Real Time Bidding System

Activity Logs

1st Party3rd Party

Distributed Log Ingestion

S3/HDFS Warehouse

CDN

User

ProfilesCampaign

Metadata

ETL Attribution Machine Learning

SpendDecision

System

Audience

CalculationUniques/S

egment

Big Velocity950K TPS

Big VolumePetabyte of Data

Big VarietyData Providers

Page 68: Big Data Use Cases and Solutions in the AWS Cloud

High Level Deployment

ON PREMISE

SSL

Meta

Amazon S3

RTB

System

Elastic Load

Balancing

Availability Zone

Route

53

EC2

Auto scaling Group

Volumes

AMI

Availability Zone

Log

Ingestion

System

Machine

Learning

SystemAuto scaling

Group

EMR

CloudWatch

Page 69: Big Data Use Cases and Solutions in the AWS Cloud

Traditional Hadoop vs EMR• Traditional Hadoop

– Anticipate and provision for peaks

– Cant de-couple storage and compute

– 75% cluster is idle

– Data Duplication/Multiple Clusters

• EMR to the rescue

• Monthly savings of 72%using EMR

Page 70: Big Data Use Cases and Solutions in the AWS Cloud

S3 Provides Linearly Scalable Bandwidth

• Big volume workloads involve several datasets together and terabytes of data

• Aggregate bandwidth matters

• S3 scales pretty linearly

S3 Streaming Performance

(m1.xlarge @ $0.34/hr)100 VMs; 9.6GB/s; $34/hr

350 VMs; 28.7GB/s; $119/hr

34 secs per terabyte

Page 71: Big Data Use Cases and Solutions in the AWS Cloud

ThankYou

www.dataxu.com

Yekesa Kosuru, @ykosuru

[email protected]

Page 72: Big Data Use Cases and Solutions in the AWS Cloud

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Getting Started with

Big Data on AWS

Page 73: Big Data Use Cases and Solutions in the AWS Cloud

AWS is here to help

Solution

Architects

Professional

ServicesPremium

Support

AWS Partner

Network (APN)

Page 74: Big Data Use Cases and Solutions in the AWS Cloud

aws.amazon.com/partners/competencies/big-data

Partner with an AWS Big Data expert

Page 75: Big Data Use Cases and Solutions in the AWS Cloud

https://aws.amazon.com/architecture/

Processing large amounts of parallel

data using a scalable cluster

AWS Architecture Diagrams

Page 76: Big Data Use Cases and Solutions in the AWS Cloud

http://aws.amazon.com/marketplace

Big Data Case Studies

Learn from other AWS customers

aws.amazon.com/solutions/case-studies/big-data

Page 77: Big Data Use Cases and Solutions in the AWS Cloud

AWS Marketplace

AWS Online Software Store

aws.amazon.com/marketplace

Shop the big data category

Page 78: Big Data Use Cases and Solutions in the AWS Cloud

http://aws.amazon.com/marketplace

AWS Public Data Sets

Free access to big data sets

aws.amazon.com/publicdatasets

Page 79: Big Data Use Cases and Solutions in the AWS Cloud

AWS Grants Program

AWS in Education

aws.amazon.com/grants

Page 80: Big Data Use Cases and Solutions in the AWS Cloud

AWS Big Data Test Drives

APN Partner-provided labs

aws.amazon.com/testdrive/bigdata

Page 81: Big Data Use Cases and Solutions in the AWS Cloud

https://aws.amazon.com/training

AWS Training & Events

Webinars, Bootcamps,

and Self-Paced Labs

aws.amazon.com/events

Page 82: Big Data Use Cases and Solutions in the AWS Cloud

Big Data on AWS

Course on Big Data

aws.amazon.com/training/course-descriptions/bigdata

Page 83: Big Data Use Cases and Solutions in the AWS Cloud

reinvent.awsevents.com

Page 84: Big Data Use Cases and Solutions in the AWS Cloud

aws.amazon.com/big-data

Page 85: Big Data Use Cases and Solutions in the AWS Cloud

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Thank you!

Ben Butler, @bensbutler, Sr. Mgr., Big Data

July 10, 2014 – http://aws.amazon.com/big-data


Recommended