Date post: | 24-May-2015 |
Category: |
Technology |
Upload: | aws-marketplace |
View: | 322 times |
Download: | 0 times |
Big Data Analyticsw i t h A m a z o n W e b S e r v i c e s
Dr. Matt Wood
An Online Seminar. Tuesday 16th October.
Hello, and thank you.
Big Data Analytics
An introduction
Big Data Analytics
An introduction
The story of analytics on AWS
Big Data Analytics
An introduction
The story of analytics on AWS
AWS Marketplace
Big Data Analytics
An introduction
The story of analytics on AWS
AWS Marketplace
Success story: Brightcove
INTRODUCING BIG DATA
1
Data for competitive advantage.
Customer segmentation, financial modeling, system analysis,line-of-sight,business intelligence.
Using data
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Cost of data generationis falling.
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
lower cost, increased throughput
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
HIGHLY CONSTRAINED
Very high barrier to turning data into information.
Move from a data generation challenge to
analytics challenge.
Enter the Cloud.
Remove the constraints.
Enable data-driven innovation.
Move to a distributed data approach.
Maturation of two things.
Maturation of two things.
Software for distributed storage and analysis
Maturation of two things.
Software for distributed storage and analysis
Infrastructure for distributed storage and analysis
Frameworks for data-intensive workloads.
Software
Distributed by design.
Platform for data-intensive workloads.
Infrastructure
Distributed by design.
Support the data timeline.
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
HIGHLY CONSTRAINED
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Lower the barrier to entry.
Accelerate time to market and increase agility.
Enable new business opportunities.
Washington Post
NASA
“AWS enables Pfizer to explore difficult or deep scientific questions in a timely, scalable manner and helps us make better decisions more quickly”
Michael Miller, Pfizer
THE STORY OF ANALYTICS
2
EC2
Utility computing. 6 years young.
Embarrassingly parallel problems.
Scale out systems
Queue based distribution.
Small, medium and high scale.
EC2
Utility computing. 6 years young.
Cost optimization.
Achieving economies of scale100%
Time
Reserved capacity
Achieving economies of scale100%
Time
Reserved capacity
Achieving economies of scale100%
Time
On-demand
Reserved capacity
Achieving economies of scale100%
Time
On-demand
UNUSED CAPACITY
Bid on unused EC2 capacity.
Spot Instances
Very large discount.
Perfect for batch runs.
Balance cost and scale.
<$1000 per hour
Pattern for distributed computing.
Map/reduce
Software frameworks such as Hadoop.
Write two functions. Scale up.
Pattern for distributed computing.
Map/reduce
Software frameworks such as Hadoop.
Write two functions. Scale up.
Complex cluster configuration and management.
Managed Hadoop clusters.
Amazon Elastic MapReduce
Easy to provision and monitor.
Write two functions. Scale up.
Optimized for S3 access.
Input data
S3
Elastic MapReduce
Code
Input data
S3
Elastic MapReduce
Code Name node
Input data
S3
Elastic MapReduce
Code Name node
Input data
S3
Elastic cluster
Elastic MapReduce
Code Name node
Input data
S3
Elastic cluster
HDFS
Elastic MapReduce
Code Name node
Input data
S3
Elastic cluster
HDFSQueries
+ BIVia JDBC, Pig, Hive
Elastic MapReduce
Code Name node
OutputS3 + SimpleDB
Input data
S3
Elastic cluster
HDFSQueries
+ BIVia JDBC, Pig, Hive
OutputS3 + SimpleDB
Input data
S3
Performance
Performance
Compute performance
Intel Xeon E5-2670
Cluster Compute
10 gig E non-blocking network
Placement groupings
60.5 Gb
Intel Xeon E5-2670
Cluster Compute
10 gig E non-blocking network
Placement groupings
60.5 Gb
+ GPU enabled instances
Performance
Compute performance
Performance
Compute performance
IO performance
NoSQLUnstructured data storage.
Predictable, consistent performance
DynamoDB
Unlimited storage
No schema for unstructured data
Single digit millisecond latencies
Backed on solid state drives
...and SSDs for all.New Hi1 storage instances.
2 x 1Tb SSDs
hi1.4xlarge
10 GigE network
HVM: 90k IOPS read, 9k to 75k write
PV: 120k IOPS read, 10k to 85k write
Netflix
“The hi1.4xlarge configuration is about half the system cost for the same throughput.”
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Performance + ease of use
AWS MARKETPLACE
3
Extend platform with partners
Innovate on behalf of customers
Remove undifferentiated heavy lifting
AWS Marketplaceaws.amazon.com/marketplace
Generation
Collection & storage
Analytics & computation
Collaboration & sharing
Generation
Analytics & computation
Collaboration & sharing
Collection & storage
Acunu ReflexApache Cassandra NoSQL database
Collection & storage
MongoDBWith and without EBS RAID storage
CouchbaseCommunity and Enterprise editions
ScaleArcMySQL load balancing
Generation
Analytics & computation
Collaboration & sharing
Collection & storage
Generation
Analytics & computation
Collaboration & sharing
Analytics & computation
Collection & storage
KarmaSphere Analyticsfor Amazon Elastic MapReduce
MapR M5Hadoop Distribution
MetamarketsEvent based data processing
Analytics & computation
StackIQ Rocks+HPC clusters with MPI, Grid Engine
Univa Grid EngineOne click cluster deployment
QuantivoData association analytics
Analytics & computation
Generation
Analytics & computation
Collaboration & sharing
Analytics & computation
Collection & storage
Generation
Analytics & computation
Collection & storage
Collaboration & sharing
Aspera Faspex20 Mbps data transfer
Collaboration & sharing
SUCCESS STORY
4