+ All Categories
Home > Technology > Big Data and Analytics

Big Data and Analytics

Date post: 21-Jul-2015
Category:
Upload: amazon-web-services
View: 402 times
Download: 0 times
Share this document with a friend
Popular Tags:
38
©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved Big Data and Analytics End to End on AWS Russell Nash – AWS Solutions Architect
Transcript
Page 1: Big Data and Analytics

©2015,  Amazon  Web  Services,  Inc.  or  its  affiliates.  All  rights  reserved

Big Data and Analytics End to End on AWS

Russell Nash – AWS Solutions Architect

Page 2: Big Data and Analytics

Agenda

End to End Framework

Use Cases

Demonstration

Religion

Greater Good

FOMO

Page 3: Big Data and Analytics

Greater  Good  

                                                                                                                   

Page 4: Big Data and Analytics
Page 5: Big Data and Analytics
Page 6: Big Data and Analytics

Big Data End to End Framework

Page 7: Big Data and Analytics

Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Apache Storm

PIG

Amazon Machine Learning

Amazon EMR

Amazon Glacier

Amazon DynamoDB

Page 8: Big Data and Analytics

”I  got  kicked  out  of  the  bookshop  last  week,    because  I  moved  all  of  the  Big  Data  books    

into  the  Religion  sec>on”                                                                                                                  

   

Page 9: Big Data and Analytics

Ingest Store Process Analyse Data Answers

Simplify Big Data Processing

Page 10: Big Data and Analytics

Databases

Database Flat Files Database

Data

File Data

Event Producer

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Database

INGEST STORE

Page 11: Big Data and Analytics

Databases

Database Flat Files Database

Data

File Data

Event Producer

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Database

INGEST

Amazon Redshift

Amazon RDS

STORE

Page 12: Big Data and Analytics

Data Tier

Search Cache Object Store

RDBMS NoSQL Data Warehouse

logging analyAcs

webscale transacAons

rich  search hot  reads complex  queries and  transacAons

Data Tier

Amazon DynamoDB

Amazon RDS

Amazon ElastiCache

Amazon S3

Amazon Redshift

Amazon CloudSearch

Traditional Relational Database

Page 13: Big Data and Analytics

Amazon

Redshift Amazon

RDS

Scaling Vertical Horizontal

Storage Row Column

Workload Transactional Analytical

Architecture SMP MPP

Type SQL Relational SQL Relational

Page 14: Big Data and Analytics

”Some  of  the  queries  we’re  running  are  98  percent  faster,  and  most  things  are  running  90  percent  faster”                                                                                                                  -­‐    FT  CTO  John  O’Donovan      

Page 15: Big Data and Analytics

Databases

Database Flat Files Database

Data

File Data

Event Producer

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Storage

INGEST

Amazon Redshift

Amazon RDS

Application

Amazon S3

STORE

Page 16: Big Data and Analytics

Impala PIG

Amazon EMR

Page 17: Big Data and Analytics

Amazon S3

Amazon Redshift

Amazon EMR

Glacier

Amazon

DynamoDB

Amazon Machine Learning

Applications

Page 18: Big Data and Analytics

Amazon

Redshift

Scaling Add nodes Automatic

Speed Fastest Fast

Cost Higher Lower

Durability Configurable Built-in

Amazon S3

Page 19: Big Data and Analytics

”Avoid  vendor  lock-­‐in”              -­‐  Saman  Michael  Far  -­‐SVP  

                                                                           

Page 20: Big Data and Analytics

Databases

Database Flat Files Database

Data

File Data

Event Producer

Android iOS

Streaming Data

Sales Data Customer Data

Web Logs Server Logs

Clickstream data Sensor data

Stream Processor

INGEST

Amazon Redshift

Amazon RDS

Amazon S3

Amazon Kinesis

STORE

Page 21: Big Data and Analytics

Why Stream Storage?

Sensors

Amazon Kinesis

Apache Kafka

Page 22: Big Data and Analytics

Availability Zone

Availability Zone

Availability Zone

 Data  Sources  

 Data  Sources  

Data  Sources  

 Data  Sources  

 Data  Sources  

Logging

Metrics

Analysis

Complex  Processing  

S3

DynamoDB

Redshift

Apache Storm

Amazon Kinesis

Stream

Page 23: Big Data and Analytics

Amazon

Redshift

Ordering Yes Yes

Persistence 24 Hours Configurable

Size 50 KB Configurable

Scaling High High

Latency Low Low

Managed Yes No

Amazon Kinesis

Page 24: Big Data and Analytics

”The  world  of  gaming  never  sleeps.    We  owe  every  player  a  great  experience,    and  AWS  is  our  main  tool  to  make  that  happen.”                                                                                                              -­‐    Sami  Yliharju,  Services  Lead    

Page 25: Big Data and Analytics

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Amazon EMR

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

Stre

amin

g

Hadoop

Page 26: Big Data and Analytics

Amazon

Redshift

Scaling 2 PB+ Nodes

Storage Native HDFS/S3

BI Tools High Medium

Durability High High

Latency Low Low

Managed Fully Semi (EMR)

Amazon

Redshift

Nodes

HDFS

Medium

High

Low

Semi (EMR)

Amazon Redshift Impala

Page 27: Big Data and Analytics

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

PIG

Stre

amin

g

Amazon EMR

Hadoop

Page 28: Big Data and Analytics

PIG

SQL on Hadoop

Eats anything

New Processing Engine

Page 29: Big Data and Analytics

Amplab Big Data Benchmark

https://amplab.cs.berkeley.edu/benchmark/

Page 30: Big Data and Analytics

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumers

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

Stre

amin

g

PIG

Amazon EMR

Hadoop

Page 31: Big Data and Analytics

INGEST STORE PROCESS

Event Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumers

Flat Files Database

Data

Event Data

Streaming Data

Inte

ract

ive

Bat

ch

Stre

amin

g

PIG

ANALYSE

Amazon Machine Learning

L

Amazon EMR

Hadoop

Page 32: Big Data and Analytics

Use Cases

Page 33: Big Data and Analytics

FOMO                                                                                                                      

Page 34: Big Data and Analytics

Amazon EMR

Hadoop

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Flat Files Database

Data

Event Data

Streaming Data

Databases Amazon Redshift

Amazon Redshift

Database Data

SQL Analytics

Page 35: Big Data and Analytics

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Am

azon

Ela

stic

Map

Red

uce

Flat Files Database

Data

Event Data

Streaming Data

Clickstream Analysis - Batch

Am

azon

Ela

stic

Map

Red

uce

Event Data

Amazon EMR

Hadoop

Page 36: Big Data and Analytics

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Am

azon

Ela

stic

Map

Red

uce

Flat Files Database

Data

Event Data

Streaming Data

Clickstream Analysis – Near Real Time

Event Producer

Amazon Kinesis

Amazon S3

Amazon Redshift

Kinesis Consumers

Streaming Data

Page 37: Big Data and Analytics

Amazon Machine Learning

Kinesis Producer

Android iOS

Databases Amazon Redshift

Amazon Kinesis

Amazon S3

Amazon RDS

Impala

Amazon Redshift

Apache Storm

Kinesis Consumer

Am

azon

Ela

stic

Map

Red

uce

Flat Files Database

Data

Event Data

Streaming Data

Data Lake – Self Service Analysis

Databases

Amazon S3

Database Data

Event Data

Streaming Data Android

iOS

Impala

Amazon Redshift

Amazon Machine Learning

Amazon EMR

Hadoop

Page 38: Big Data and Analytics

Recommended