Big Data and Analytics

Post on 21-Jul-2015

402 views 0 download

Tags:

transcript

©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved

Big Data and Analytics End to End on AWS

Russell Nash – AWS Solutions Architect

Agenda

End to End Framework

Use Cases

Demonstration

Religion

Greater Good

FOMO

Greater  Good  

                                                                                                                   

Big Data End to End Framework

Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Apache Storm

PIG

Amazon Machine Learning

Amazon EMR

Amazon Glacier

Amazon DynamoDB

”I  got  kicked  out  of  the  bookshop  last  week,    because  I  moved  all  of  the  Big  Data  books    

into  the  Religion  sec>on”                                                                                                                  

   

Ingest Store Process Analyse Data Answers

Simplify Big Data Processing

Databases

Database Flat Files Database

Data

File Data

Event Producer

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Database

INGEST STORE

Databases

Database Flat Files Database

Data

File Data

Event Producer

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Database

INGEST

Amazon Redshift

Amazon RDS

STORE

Data Tier

Search Cache Object Store

RDBMS NoSQL Data Warehouse

logging analyAcs

webscale transacAons

rich  search hot  reads complex  queries and  transacAons

Data Tier

Amazon DynamoDB

Amazon RDS

Amazon ElastiCache

Amazon S3

Amazon Redshift

Amazon CloudSearch

Traditional Relational Database

Amazon

Redshift Amazon

RDS

Scaling Vertical Horizontal

Storage Row Column

Workload Transactional Analytical

Architecture SMP MPP

Type SQL Relational SQL Relational

”Some  of  the  queries  we’re  running  are  98  percent  faster,  and  most  things  are  running  90  percent  faster”                                                                                                                  -­‐    FT  CTO  John  O’Donovan      

Databases

Database Flat Files Database

Data

File Data

Event Producer

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Storage

INGEST

Amazon Redshift

Amazon RDS

Application

Amazon S3

STORE

Impala PIG

Amazon EMR

Amazon S3

Amazon Redshift

Amazon EMR

Glacier

Amazon

DynamoDB

Amazon Machine Learning

Applications

Amazon

Redshift

Scaling Add nodes Automatic

Speed Fastest Fast

Cost Higher Lower

Durability Configurable Built-in

Amazon S3

”Avoid  vendor  lock-­‐in”              -­‐  Saman  Michael  Far  -­‐SVP  

                                                                           

Databases

Database Flat Files Database

Data

File Data

Event Producer

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Stream Processor

INGEST

Amazon Redshift

Amazon RDS

Amazon S3

Amazon Kinesis

STORE

Why Stream Storage?

Sensors

Amazon Kinesis

Apache Kafka

Availability Zone

Availability Zone

Availability Zone

 Data  Sources  

 Data  Sources  

Data  Sources  

 Data  Sources  

 Data  Sources  

Logging

Metrics

Analysis

Complex  Processing  

S3

DynamoDB

Redshift

Apache Storm

Amazon Kinesis

Stream

Amazon

Redshift

Ordering Yes Yes

Persistence 24 Hours Configurable

Size 50 KB Configurable

Scaling High High

Latency Low Low

Managed Yes No

Amazon Kinesis

”The  world  of  gaming  never  sleeps.    We  owe  every  player  a  great  experience,    and  AWS  is  our  main  tool  to  make  that  happen.”                                                                                                              -­‐    Sami  Yliharju,  Services  Lead    

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Amazon EMR

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

Stre

amin

g

Hadoop

Amazon

Redshift

Scaling 2 PB+ Nodes

Storage Native HDFS/S3

BI Tools High Medium

Durability High High

Latency Low Low

Managed Fully Semi (EMR)

Amazon

Redshift

Nodes

HDFS

Medium

High

Low

Semi (EMR)

Amazon Redshift Impala

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

PIG

Stre

amin

g

Amazon EMR

Hadoop

PIG

SQL on Hadoop

Eats anything

New Processing Engine

Amplab Big Data Benchmark

https://amplab.cs.berkeley.edu/benchmark/

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumers

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

Stre

amin

g

PIG

Amazon EMR

Hadoop

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumers

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

Stre

amin

g

PIG

ANALYSE

Amazon Machine Learning

L

Amazon EMR

Hadoop

Use Cases

FOMO                                                                                                                      

Amazon EMR

Hadoop

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Flat Files Database

Data

Event Data

Streaming Data

Databases Amazon Redshift

Amazon Redshift

Database Data

SQL Analytics

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Am

azon

Ela

stic

Map

Red

uce

Flat Files Database

Data

Event Data

Streaming Data

Clickstream Analysis - Batch

Am

azon

Ela

stic

Map

Red

uce

Event Data

Amazon EMR

Hadoop

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Am

azon

Ela

stic

Map

Red

uce

Flat Files Database

Data

Event Data

Streaming Data

Clickstream Analysis – Near Real Time

Event Producer

Amazon Kinesis

Amazon S3

Amazon Redshift

Kinesis Consumers

Streaming Data

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Am

azon

Ela

stic

Map

Red

uce

Flat Files Database

Data

Event Data

Streaming Data

Data Lake – Self Service Analysis

Databases

Amazon S3

Database Data

Event Data

Streaming Data Android

iOS

Impala

Amazon Redshift

Amazon Machine Learning

Amazon EMR

Hadoop