Post on 22-Sep-2019
transcript
Igor Roiter – Big Data Cloud Solution Architect
• Working as a Data Specialist for the last 11 years
• 9 of them as a Consultant specializing in Ad-tech, Fin-tech and the Gaming industries
• Certified NoSQL & BigData Cloud Engineer
• IoT enthusiast in my spare time
Building a Big Data Application on AWS
EMR
Analyze
Amazon GlacierS3
StoreIngest
Amazon Kinesis Amazon
Redshift
DynamoDB AWS IoT
AWS Snowball
Amazon
Athena
EC2
Amazon
Elasticsearch
Service
Lambda
AWS Database Migration
ServiceAWS Data
Pipeline
Amazon
QuickSight
AWS Database Migration
Service
Building a Big Data Application on AWS
AWS CloudCorporate DBMS
How to get corporate data into cloud
Application hosts
Building a Big Data Application on AWS
AWS CloudCorporate DBMS
AWS Database Migration
Service
Getting data into AWS Cloud
Application hosts
AWS Database Migration Service
AWS Database Migration
Service
• Simple usage
• No data-loss, No downtime
• Low cost – only pay for compute resources
Building a Big Data Application on AWS
AWS CloudCorporate DBMS
AWS Database Migration
Service
Where do we migrate the data to
Application hosts
Building a Big Data Application on AWS
Data Warehouse
AWS Cloud
AWS Database Migration
Service Amazon Redshift
Corporate DBMS
Application hosts
AWS Redshift – Structured Data Processing
Amazon Redshift
• Fully Managed columnar data warehouse
• Standard SQL support
• Petabyte-scale
• Fault-tolerant
Building a Big Data Application on AWS
How about some BI
AWS Cloud
AWS Database Migration
Service Amazon Redshift
Corporate DBMS
Application hosts
Building a Big Data Application on AWS
BI tool
AWS Cloud
Amazon
QuickSight
AWS Database Migration
Service Amazon Redshift
Corporate DBMS
Application hosts
Amazon QuickSight
Amazon
QuickSight
• Managed BI tool
• Scales to 100s of users
• Auto-suggest the best visualizations for your data
• 1/10th the cost of other popular BI software
What About Unstructured Data?
• What if your data is unstructured?
• What if you don’t need all the raw data?
• What if you need to combine multiple data sets?
Building a Big Data Application on AWS
Handling unstructured data
AWS Cloud
Amazon
QuickSight
AWS Database Migration
Service Amazon Redshift
Corporate DBMS
Application hosts
Building a Big Data Application on AWS
AWS Lambda – Event driven data transformation
Unstructured Raw data
in S3
Structured Data
In Amazon S3AWS
Lambda
Amazon
QuickSight
AWS Database Migration
Service
Amazon Redshift
Corporate DBMS
Lambda triggerApplication hosts
• Managed object storage
• No administration
• No capacity limit
• Data resilience
• $0.02 per GB/month
AWS Simple Storage Service (S3)
S3
AWS Lambda – Serverless Event Processing
AWS Lambda
• Function as a service
• Write code in NodeJS, Python or Java
• Event driven
• Low cost
Building a Big Data Application on AWS
Need more throughput…
Unstructured Raw data
in S3
Structured Data
In Amazon S3AWS
Lambda
Amazon
QuickSight
AWS Database Migration
Service
Amazon Redshift
Corporate DBMS
Lambda triggerApplication hosts
Building a Big Data Application on AWS
Unstructured data transformation using EMR
Unstructured Raw data
in S3
Structured Data
In Amazon S3
Amazon
QuickSight
AWS Database Migration
Service
Amazon Redshift
Corporate DBMS
Amazon EMR
Application hosts
Amazon EMR - Unstructured Data Processing
Amazon EMR
• Fully managed, cloud pre-tuned Hadoop eco-system
• Hadoop 2.7.3, Hive 2.3.1, Spark 2.2.0, Hbase 1.3.1 + Hbase on S3, Presto, Flink…
• On-demand and Spot Instances
• Fully integrated with S3
• Provision cluster for a job then terminate
Building a Big Data Application on AWS
Ad-hoc query on Raw data
Unstructured Raw data
in S3
Structured Data
In Amazon S3
Amazon
QuickSight
AWS Database Migration
Service
Amazon Redshift
Corporate DBMS
Amazon EMR
Application hosts
Building a Big Data Application on AWS
Ad-hoc query service on S3 buckets
Unstructured Raw data
in S3
Structured Data
In Amazon S3
Amazon
QuickSight
AWS Database Migration
Service
Amazon Redshift
Corporate DBMS
Amazon
Athena
Amazon EMR
Application hosts
AWS Athena – Serverless Query Processing
Amazon
Athena
• Serverless query service for querying data in S3
• No data load/ETL, data is queried on S3
• Pay-per-query – based on scanned data amount
• Standard SQL
Building a Big Data Application on AWS
Pre-process data to columnar data format
Unstructured Raw data
in S3
Structured Data
In Amazon S3
Amazon
QuickSight
AWS Database Migration
Service
Amazon Redshift
Corporate DBMS
Amazon
AthenaAmazon EMR Parquet (columnar) Data
In Amazon S3
Amazon EMR
Application hosts
Building a Big Data Application on AWS
No more batch
Unstructured Raw data
in S3
Structured Data
In Amazon S3
Amazon
QuickSight
AWS Database Migration
Service
Amazon Redshift
Corporate DBMS
Amazon
AthenaAmazon EMR Parquet (columnar) Data
In Amazon S3
Amazon EMR
Application hosts
Building a Big Data Application on AWS
Add a real-time layer – kinesis streams
Unstructured Raw data
in S3
Structured Data
In Amazon S3
Amazon
QuickSight
AWS Database Migration
Service
Amazon Redshift
Corporate DBMS
Amazon
Athena
Amazon
EMR
Parquet (columnar) Data
In Amazon S3Kinesis Streams
Amazon
EMR
Application hostsKCL
Amazon Kinesis Streams
Kinesis Streams
• Managed Real-time stream processing
• Dynamically adjust throughput of the stream
• Data resilience
• Produce data into stream using KPL, Read it with KCL
Amazon Kinesis Firehose
Kinesis Firehose
• Load streaming data into AWS data stores: S3, RedShift
• Fully managed, auto-scaled
• Integrated Lambda
Building a Big Data Application on AWS
Add a real-time layer – kinesis streams
Unstructured Raw data
in S3
Structured Data
In Amazon S3
Amazon
QuickSight
AWS Database Migration
Service
Amazon Redshift
Corporate DBMS
Amazon
Athena
Amazon
EMR
Parquet (columnar) Data
In Amazon S3Kinesis Streams
Amazon
EMR
Application hostsKCL
• Pay for what you use only
• Avoid over/under provision
• Dynamic scale – everything
• Go to prod faster then ever
Summary – Big Data on AWS
S3/Glacier Select Preview
• Pulls out only the data you need from an object
• Offloads filter processing from application to S3 service
• S3/Glacier Select SDK supports Java & PythonS3 Select
Glacier Select
Amazon Kinesis Video Streams
• Fully managed video ingestion and storage service
• Secure SDKs for devices to stream video to AWS
• APIs for access and retrieve indexed video fragments based on tags and timestampsAmazon Kinesis video
streams
AWS Lambda updates…
AWS Lambda
• Added ability to shift traffic between 2 AWS Lambda versions based on pre-assigned weights
• Doubled available memory for a function from 1536MB to 3008MB
• Added ability to add a concurrency limit on a Lambda functions
• The AWS Lambda console has been updated with enhancements: Cloud-9 based editor, Improved monitoring, and more…
AWS IoT Analytics Preview
AWS IoT
Analytics
• Built-in IoT Analytics SQL query engine
• Stores the processed device data in a time-series data
• Scales automatically to support up to petabytes of IoT data
• Apply machine learning to your IoT data with hosted Jupyter notebooks, right from the IoT console