©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved
Big Data and Analytics End to End on AWS
Russell Nash – AWS Solutions Architect
Agenda
End to End Framework
Use Cases
Demonstration
Religion
Greater Good
FOMO
Greater Good
Big Data End to End Framework
Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Apache Storm
PIG
Amazon Machine Learning
Amazon EMR
Amazon Glacier
Amazon DynamoDB
”I got kicked out of the bookshop last week, because I moved all of the Big Data books
into the Religion sec>on”
Ingest Store Process Analyse Data Answers
Simplify Big Data Processing
Databases
Database Flat Files Database
Data
File Data
Event Producer
Android iOS
Streaming Data
Sales Data Customer Data
Web Logs Server Logs
Clickstream data Sensor data
Database
INGEST STORE
Databases
Database Flat Files Database
Data
File Data
Event Producer
Android iOS
Streaming Data
Sales Data Customer Data
Web Logs Server Logs
Clickstream data Sensor data
Database
INGEST
Amazon Redshift
Amazon RDS
STORE
Data Tier
Search Cache Object Store
RDBMS NoSQL Data Warehouse
logging analyAcs
webscale transacAons
rich search hot reads complex queries and transacAons
Data Tier
Amazon DynamoDB
Amazon RDS
Amazon ElastiCache
Amazon S3
Amazon Redshift
Amazon CloudSearch
Traditional Relational Database
Amazon
Redshift Amazon
RDS
Scaling Vertical Horizontal
Storage Row Column
Workload Transactional Analytical
Architecture SMP MPP
Type SQL Relational SQL Relational
”Some of the queries we’re running are 98 percent faster, and most things are running 90 percent faster” -‐ FT CTO John O’Donovan
Databases
Database Flat Files Database
Data
File Data
Event Producer
Android iOS
Streaming Data
Sales Data Customer Data
Web Logs Server Logs
Clickstream data Sensor data
Storage
INGEST
Amazon Redshift
Amazon RDS
Application
Amazon S3
STORE
Impala PIG
Amazon EMR
Amazon S3
Amazon Redshift
Amazon EMR
Glacier
Amazon
DynamoDB
Amazon Machine Learning
Applications
Amazon
Redshift
Scaling Add nodes Automatic
Speed Fastest Fast
Cost Higher Lower
Durability Configurable Built-in
Amazon S3
”Avoid vendor lock-‐in” -‐ Saman Michael Far -‐SVP
Databases
Database Flat Files Database
Data
File Data
Event Producer
Android iOS
Streaming Data
Sales Data Customer Data
Web Logs Server Logs
Clickstream data Sensor data
Stream Processor
INGEST
Amazon Redshift
Amazon RDS
Amazon S3
Amazon Kinesis
STORE
Why Stream Storage?
Sensors
Amazon Kinesis
Apache Kafka
Availability Zone
Availability Zone
Availability Zone
Data Sources
Data Sources
Data Sources
Data Sources
Data Sources
Logging
Metrics
Analysis
Complex Processing
S3
DynamoDB
Redshift
Apache Storm
Amazon Kinesis
Stream
Amazon
Redshift
Ordering Yes Yes
Persistence 24 Hours Configurable
Size 50 KB Configurable
Scaling High High
Latency Low Low
Managed Yes No
Amazon Kinesis
”The world of gaming never sleeps. We owe every player a great experience, and AWS is our main tool to make that happen.” -‐ Sami Yliharju, Services Lead
INGEST STORE PROCESS
Event Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Amazon EMR
Flat Files Database
Data
Event Data
Streaming Data
Inte
ract
ive
Bat
ch
Stre
amin
g
Hadoop
Amazon
Redshift
Scaling 2 PB+ Nodes
Storage Native HDFS/S3
BI Tools High Medium
Durability High High
Latency Low Low
Managed Fully Semi (EMR)
Amazon
Redshift
Nodes
HDFS
Medium
High
Low
Semi (EMR)
Amazon Redshift Impala
INGEST STORE PROCESS
Event Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Flat Files Database
Data
Event Data
Streaming Data
Inte
ract
ive
Bat
ch
PIG
Stre
amin
g
Amazon EMR
Hadoop
PIG
SQL on Hadoop
Eats anything
New Processing Engine
Amplab Big Data Benchmark
https://amplab.cs.berkeley.edu/benchmark/
INGEST STORE PROCESS
Event Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Kinesis Consumers
Flat Files Database
Data
Event Data
Streaming Data
Inte
ract
ive
Bat
ch
Stre
amin
g
PIG
Amazon EMR
Hadoop
INGEST STORE PROCESS
Event Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Kinesis Consumers
Flat Files Database
Data
Event Data
Streaming Data
Inte
ract
ive
Bat
ch
Stre
amin
g
PIG
ANALYSE
Amazon Machine Learning
L
Amazon EMR
Hadoop
Use Cases
FOMO
Amazon EMR
Hadoop
Amazon Machine Learning
Kinesis Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Kinesis Consumer
Flat Files Database
Data
Event Data
Streaming Data
Databases Amazon Redshift
Amazon Redshift
Database Data
SQL Analytics
Amazon Machine Learning
Kinesis Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Kinesis Consumer
Am
azon
Ela
stic
Map
Red
uce
Flat Files Database
Data
Event Data
Streaming Data
Clickstream Analysis - Batch
Am
azon
Ela
stic
Map
Red
uce
Event Data
Amazon EMR
Hadoop
Amazon Machine Learning
Kinesis Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Kinesis Consumer
Am
azon
Ela
stic
Map
Red
uce
Flat Files Database
Data
Event Data
Streaming Data
Clickstream Analysis – Near Real Time
Event Producer
Amazon Kinesis
Amazon S3
Amazon Redshift
Kinesis Consumers
Streaming Data
Amazon Machine Learning
Kinesis Producer
Android iOS
Databases Amazon Redshift
Amazon Kinesis
Amazon S3
Amazon RDS
Impala
Amazon Redshift
Apache Storm
Kinesis Consumer
Am
azon
Ela
stic
Map
Red
uce
Flat Files Database
Data
Event Data
Streaming Data
Data Lake – Self Service Analysis
Databases
Amazon S3
Database Data
Event Data
Streaming Data Android
iOS
Impala
Amazon Redshift
Amazon Machine Learning
Amazon EMR
Hadoop