+ All Categories
Home > Technology > AWS Webcast - Introducing Amazon Redshift

AWS Webcast - Introducing Amazon Redshift

Date post: 15-Jan-2015
Category:
Upload: amazon-web-services
View: 1,359 times
Download: 4 times
Share this document with a friend
Description:
This webinar is aimed at older portfolio companies who may have started when AWS wasn't as strong as it is today. Redshift is a great way to to use the cloud and bring data to the cloud where other cloud services (EMR) can consume it.
Popular Tags:
44
© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc. Introducing Amazon Redshift Amazon’s Data Warehouse as a Service Ben Butler, Solutions Architect Worldwide Public Sector [email protected]
Transcript
Page 1: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Introducing Amazon Redshift

Amazon’s Data Warehouse as a Service

Ben Butler, Solutions Architect

Worldwide Public Sector

[email protected]

Page 2: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

What is Amazon Web Services?

AWS Global Infrastructure

Application Services

Networking

Deployment & Administration

Database Storage Compute

Page 3: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

What is Amazon Web Services?

AWS Global Infrastructure

Application Services

Networking

Deployment & Administration

Storage Compute Database

Page 4: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

AWS Database Services Fully managed SQL database service for OLTP

workloads

Fully managed NoSQL service for massively

scalable, high throughput, low latency workloads

Fully managed, fast and powerful, petabyte-scale

data warehouse service

Fully managed Memcached-compliant in memory caching service

Page 5: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

AWS Database Services Fully managed SQL database service for OLTP

workloads

Fully managed NoSQL service for massively

scalable, high throughput, low latency workloads

Fully managed, fast and powerful, petabyte-scale

data warehouse service

Fully managed Memcached-compliant in memory caching service

Page 6: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Traditional data warehousing is expensive and

complicated

Expensive Hardware and Software

Complex Tuning and Admin

Enterprises average between 3

and 4 DBAs per data

warehouse

Source: Oracle technology global price list 11/1/2012

Gartner: Critical factors in calculating the data warehouse TCO, July 2009

Page 7: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Customers Aren’t Happy with Today’s Solutions

Large Companies Small Companies

Expensive

Hard to scale

Can’t afford to have a data warehouse

Page 8: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Data warehousing done the AWS way

• Pay as you go, no up front costs

• Fast, cheap, easy to use

• SQL

• Provision in minutes

Page 9: AWS Webcast - Introducing Amazon Redshift

Introducing Amazon Redshift

Data Warehousing the AWS Way

Easily and rapidly analyze

petabytes of data

1/10 the cost of traditional data

warehouses

Automated deployment &

administration

Compatible with popular BI tools

Page 10: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Most data never makes it to a data warehouse

1990 2000 2010 2020

The Data Analysis Gap

Enterprise Data

Data in Warehouse

Enterprise Data is growing at over 50% yearly

Data Warehousing growing at less than 10% yearly

Most data is left on the floor

Sources:

Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011

IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares

Page 11: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

We set out to build… A fast and powerful, petabyte-scale data warehouse that is:

A Lot Faster

A Lot Cheaper

A Lot Simpler

Amazon Redshift

Delivered as a

Managed Service

Page 12: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Data warehousing performance is all about IO

Page 13: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift dramatically reduces I/O

Data compression

Zone maps

Direct-attached storage

Large data block sizes

ID Age State Amount

123 20 CA 500

345 25 WA 250

678 40 FL 125

957 37 WA 375

• With row storage you do

unnecessary I/O

• To get total amount, you have to

read everything

Page 14: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift dramatically reduces I/O

Data compression

Zone maps

Direct-attached storage

Large data block sizes

ID Age State Amount

123 20 CA 500

345 25 WA 250

678 40 FL 125

957 37 WA 375

• With column storage, you only

read the data you need

Page 15: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift dramatically reduces I/O Column storage

Data compression

Zone maps

Direct-attached storage

Large data block sizes

• Columnar compression saves

space & reduces I/O

• Amazon Redshift analyzes and

compresses your data

analyze compression listing;

Table | Column | Encoding

---------+----------------+----------

listing | listid | delta

listing | sellerid | delta32k

listing | eventid | delta32k

listing | dateid | bytedict

listing | numtickets | bytedict

listing | priceperticket | delta32k

listing | totalprice | mostly32

listing | listtime | raw

Page 16: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift dramatically reduces I/O Column storage

Data compression

Direct-attached storage

Large data block sizes

• Keep track of the minimum and

maximum value for each block

• Skip over blocks that don’t

contain the data needed for a

given query

• Minimize unnecessary I/O

Page 17: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift dramatically reduces I/O Column storage

Data compression

Zone maps

Direct-attached storage

Large data block sizes

• Use direct-attached storage to

maximize throughput

• Hardware optimized for high

performance data processing

• Large block sizes to make the

most of each read

• Amazon Redshift manages

durability for you

Page 18: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift architecture Leader Node

• SQL endpoint

• Stores metadata

• Coordinates query execution

Compute Nodes • Local, columnar storage

• Execute queries in parallel

• Load, backup, restore via Amazon S3

• Parallel load from Amazon DynamoDB

Single node version available

10 GigE

(HPC)

Ingestion

Backup

Restore

JDBC/ODBC

Page 19: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift runs on optimized hardware HS1.8XL: 128 GB RAM, 16 Cores, 24 Spindles, 16 TB compressed user storage, 2 GB/sec scan rate

HS1.XL: 16 GB RAM, 2 Cores, 3 Spindles, 2 TB compressed customer storage

Optimized for I/O intensive workloads

High disk density

Runs in HPC - fast network

HS1.8XL available on Amazon EC2

Page 20: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift parallelizes and distributes everything

Query

Load

Backup/Restore

Resize

Page 21: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

• Load in parallel from Amazon S3 or Amazon DynamoDB

• Data automatically distributed and sorted

• Scales linearly with number of nodes

Query

Load

Backup/Restore

Resize

Amazon Redshift parallelizes and distributes everything

Page 22: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

• Backups to Amazon S3 are automatic, continuous and incremental

• Configurable system snapshot retention period

• Take user snapshots on-demand

• Streaming restores enable you to resume querying faster

Query

Load

Backup/Restore

Resize

Amazon Redshift parallelizes and distributes everything

Page 23: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

• Resize while remaining online

• Provision a new cluster in the

background

• Copy data in parallel from node to

node

• Only charged for source cluster

Query

Load

Backup/Restore

Resize

Amazon Redshift parallelizes and distributes everything

Page 24: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Query

Load

Backup/Restore

Resize

• Automatic SQL endpoint switchover

via DNS

• Decommission the source cluster

• Simple operation via AWS Console or

API

Amazon Redshift parallelizes and distributes everything

Page 25: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift lets you start small and grow big

Extra Large Node (HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores

Single Node (2 TB)

Cluster 2-32 Nodes (4 TB – 64 TB)

Eight Extra Large Node (HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE

Cluster 2-100 Nodes (32 TB – 1.6 PB)

Note: Nodes not to scale

Page 26: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift is priced to let you analyze all your data

Price Per Hour for

HS1.XL Single Node

Effective Hourly Price

Per TB

Effective Annual Price

per TB

On-Demand $ 0.850 $ 0.425 $ 3,723

1 Year

Reservation

$ 0.500 $ 0.250 $ 2,190

3 Year

Reservation

$ 0.228 $ 0.114 $ 999

Simple Pricing

Number of Nodes x Cost per Hour

No charge for Leader Node

No upfront costs

Pay as you go

Page 27: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift is easy to use Provision in minutes

Monitor query performance

Point and click resize

Built in security

Automatic backups

Page 28: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Provision a data warehouse in minutes

Page 29: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Monitor query performance

Page 30: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Deep dive analysis

Page 31: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Point and click resize

Page 32: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift has security built-in

SSL to secure data in transit

Encryption to secure data at rest • AES-256; hardware accelerated

• All blocks on disks and in Amazon S3 encrypted

No direct access to compute nodes

Amazon VPC support

10 GigE

(HPC)

Ingestion

Backup

Restore

Customer VPC

Internal

VPC

JDBC/ODBC

Page 33: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift continuously backs up your data and

recovers from failures

Replication within the cluster and backup to Amazon S3 to maintain multiple copies of

data at all times

Backups to Amazon S3 are continuous, automatic, and incremental

• Designed for eleven nines of durability

Continuous monitoring and automated recovery from failures of drives and nodes

Able to restore snapshots to any Availability Zone within a region

Page 34: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift integrates with multiple data sources

Amazon

DynamoDB

Amazon Elastic

MapReduce

Amazon Simple

Storage Service (S3)

Amazon Elastic

Compute Cloud

(EC2)

AWS Storage

Gateway

Service

Corporate

Data Center

Amazon Relational

Database Service

(RDS)

Amazon

Redshift

Page 35: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift provides multiple data loading options

Upload to Amazon S3

AWS Import/Export

AWS Direct Connect

Work with a partner

Data Integration

Systems Integrators

Page 36: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Amazon Redshift works with your existing analysis tools

JDBC/ODBC

Amazon Redshift

Page 37: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Page 38: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Pilot results have been dramatic

Tested 2 Billion row data set, 6

representative queries on a 2-

node Amazon Redshift cluster

Queries ran between 12x and

150x faster

Current environment:

32 nodes, 128 CPUs, 4.2TB

RAM, 1.6 PB disk

Page 39: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Reporting Warehouse

Accelerated operational reporting

Support for short-time use cases

Data compression, index redundancy

RDBMS Redshift

OLTP

ERP Reporting

and BI

Page 40: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Data Integration Partners*

On-Premises Integration

RDBMS Redshift

OLTP

ERP Reporting

and BI

* as of 3/14/2013

Page 41: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Live Archive for (Structured) Big Data

Direct integration with copy command

High velocity data ages into Redshift

Low cost, high scale option for new apps

DynamoDB Redshift

OLTP

Web Apps Reporting

and BI

Page 42: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Cloud ETL for Big Data

Maintain online SQL access to historical logs

Transformation and enrichment with EMR

Longer history ensures better insight

Redshift Reporting

and BI EMR

S3

Page 43: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Resources & Questions

Ben Butler | [email protected]

RedShift on AWS - http://aws.amazon.com/redshift

Marketplace - https://aws.amazon.com/marketplace/redshift/

Documentation/User Guide - http://aws.amazon.com/documentation/redshift/

Best Practices

• http://docs.aws.amazon.com/redshift/latest/dg/c_designing-tables-best-practices.html

• http://docs.aws.amazon.com/redshift/latest/dg/c_loading-data-best-practices.html

Page 44: AWS Webcast - Introducing Amazon Redshift

© 2011 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified or distributed in whole or in part without the express consent of Amazon.com, Inc.

Introducing Amazon Redshift

Amazon’s Data Warehouse as a Service

http://aws.amazon.com/resources/databaseservices/webinars

Ben Butler, Solutions Architect

Worldwide Public Sector

[email protected]


Recommended