Date post: | 11-May-2015 |
Category: |
Technology |
Upload: | amazon-web-services |
View: | 2,011 times |
Download: | 2 times |
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Redshift Overview & What’s Next
Rahul Pathak, Redshift PM ([email protected]) Anurag Gupta, Redshift GM ([email protected])
November 13, 2013
Amazon Redshift
Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• With row storage you do unnecessary I/O
• To get total amount, you have to read everything
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
• With column storage, you only read the data you need
ID Age State Amount
123 20 CA 500
345 25 WA 250
678 40 FL 125
957 37 WA 375
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
analyze compression listing; Table | Column | Encoding ---------+----------------+---------- listing | listid | delta listing | sellerid | delta32k listing | eventid | delta32k listing | dateid | bytedict listing | numtickets | bytedict listing | priceperticket | delta32k listing | totalprice | mostly32 listing | listtime | raw
Slides not intended for redistribution.
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
• COPY compresses automatically on load
• You can analyze and override
• More performance, less cost
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
10 | 13 | 14 | 26 |…
… | 100 | 245 | 324
375 | 393 | 417…
… 512 | 549 | 623
637 | 712 | 809 …
… | 834 | 921 | 959
10
324
375
623
637
959
• Track the minimum and maximum value for each block
• Skip over blocks that don’t contain relevant data
Amazon Redshift dramatically reduces I/O
• Data compression
• Zone maps
• Direct-attached storage
DW.HS1.8XL:
• > 2 GB/s scan rate
• Optimized for data processing
• High disk density
DW.HS1.XL:
Amazon Redshift architecture
• Leader Node – SQL endpoint – Stores metadata – Coordinates query execution
• Compute Nodes
– Local, columnar storage – Execute queries in parallel – Load, backup, restore via Amazon S3 – Parallel load from Amazon DynamoDB
• Single node version available
10 GigE (HPC)
Ingestion Backup Restore
JDBC/ODBC
Amazon Redshift parallelizes and distributes everything
• Load
• Backup/Restore
• Resize
• Load in parallel from Amazon S3 or Amazon DynamoDB
• Data automatically distributed and sorted according to DDL
• Scales linearly with number of nodes
Amazon Redshift parallelizes and distributes everything
• Load
• Backup/Restore
• Resize
• Backups to Amazon S3 are automatic, continuous and incremental
• Configurable system snapshot retention period
• Take user snapshots on-demand
• Streaming restores enable you to resume querying faster
Amazon Redshift parallelizes and distributes everything
• Load
• Backup/Restore
• Resize
• Resize while remaining online
• Provision a new cluster in the background
• Copy data in parallel from node to node
• Only charged for source cluster
Amazon Redshift parallelizes and distributes everything
• Load
• Backup/Restore
• Resize
• Automatic SQL endpoint switchover via DNS
• Decommission the source cluster
• Simple operation via Console or API
Amazon Redshift parallelizes and distributes everything
• Load
• Backup/Restore
• Resize
Amazon Redshift lets you start small and grow big
Extra Large Node (DW.HS1.XL) 3 spindles, 2 TB, 16 GB RAM, 2 cores Single Node (2 TB) Cluster 2-32 Nodes (4 TB – 64 TB)
Eight Extra Large Node (DW.HS1.8XL) 24 spindles, 16 TB, 128 GB RAM, 16 cores, 10 GigE Cluster 2-100 Nodes (32 TB – 1.6 PB)
Note: Nodes not to scale
Amazon Redshift is priced to let you analyze all your data
Price Per Hour for HS1.XL Single Node
Effective Hourly Price per TB
Effective Annual Price per TB
On-Demand $ 0.850 $ 0.425 $ 3,723
1 Year Reservation $ 0.500 $ 0.250 $ 2,190
3 Year Reservation $ 0.228 $ 0.114 $ 999
Simple Pricing Number of Nodes x Cost per Hour No charge for Leader Node No upfront costs Pay as you go
Amazon Redshift has security built in • SSL to secure data in transit
• Encryption to secure data at rest
– AES-256; hardware accelerated – All blocks on disk and in Amazon S3
encrypted
• No direct access to compute nodes
• Amazon VPC support
10 GigE (HPC)
Ingestion Backup Restore
Customer VPC
Internal VPC
JDBC/ODBC
Amazon Redshift automatically manages data replication and hardware failures
• Replication within the cluster and backup to Amazon S3 to maintain multiple copies of data at all times
• Backups to Amazon S3 are continuous, automatic, and incremental – Designed for eleven nines of durability
• Continuous monitoring and automated recovery from failures of drives and
nodes
• Able to restore snapshots to any Availability Zone within a region
Growing ecosystem
AWS Marketplace • Find software to use with
Amazon Redshift
• One-click deployments
• Flexible pricing options
http://aws.amazon.com/marketplace
Over 40 new features since launch on Feb 14 • Regions
– N. Virginia, Oregon, Dublin, Tokyo, Singapore, Sydney
• Certifications – PCI, SOC 1/2/3
• Security
– Load/unload encrypted files, Resource-level IAM, Temporary credentials
• Manageability – Snapshot sharing, backup/restore progress indicators
• Query
– Regex, Cursors, MD5, SHA1, Time zone, workload queue timeout
• Ingestion – S3 Manifest, LZOP/LZO, JSON built-ins, UTF-8 4byte, invalid character substitution, CSV, auto datetime format
detection, epoch
Amazon Redshift – What’s Next
Security, visibility and control
• Audit logging
• SNS Alerts
Redshift
Visibility and control
• Audit logging
• SNS Alerts
Amazon S3
Amazon Redshift
Database Activity Logins, Login failures,
Queries, Loads
System Activity Creates, Changes,
Deletes, Resizes
AWS CloudTrail
Visibility and control
• Audit logging
• SNS Alerts
Amazon Redshift
SNS Topic
Monitoring Security
Maintenance Errors
Batch operations
• Cluster Creation
• Faster Resize Amazon
Redshift
Amazon S3
Amazon EMR
Amazon EC2
Corporate Data Center
Batch operations
• Cluster Creation
• Faster Resize Amazon
Redshift
Amazon S3
Amazon EMR
Amazon EC2
Corporate Data Center
Batch operations
• Cluster Creation
• Faster Resize
15-20 min
3 min
Batch operations
• Cluster Creation
• Faster Resize
29 hours
7 hours
Performance & Concurrency
Performance & Concurrency
692.8s
34.9s
< 2%
Performance & Concurrency
5,951.7s
2,151.9s
Performance & Concurrency
15
50
Service Launch (2/14)
PDX (4/2)
Temp Credentials (4/11)
Unload Encrypted Files
DUB (4/25)
NRT (6/5)
JDBC Fetch Size (6/27)
Unload logs (7/5)
4 byte UTF-8 (7/18)
Statement Timeout (7/22)
SHA1 Builtin (7/15)
Timezone, Epoch, Autoformat (7/25)
WLM Timeout/Wildcards (8/1)
CRC32 Builtin, CSV, Restore Progress (8/9)
UTF-8 Substitution (8/29)
JSON, Regex, Cursors (9/10)
Split_part, Audit tables (10/3)
SIN/SYD (10/8)
HSM Support (11/11)
EMR/HDFS/SSH copy, Distributed Tables, Audit Logging/CloudTrail,
Concurrency, Resize Perf., Approximate Count Distinct, SNS
Alerts, WLM Memory Management (11/13)
SOC1/2/3 (5/8)
Sharing snapshots (7/18)
Resource Level IAM (8/9)
PCI (8/22)
Feature Delivery
6 weeks left
Redshift Customers at re:Invent BDT 101: Big Data ‘State of the Union’
Earlier today
DAT 305: Getting Maximum Performance from Amazon Redshift Wednesday 11/13: 3pm in Murano 3303
Redshift Customers at re:Invent DAT 306: How Amazon.com is Leveraging Amazon Redshift
Thursday 11/14: 3pm in Murano 3303
DAT 205: Amazon Redshift in Action: Enterprise, Big Data, SaaS Friday 11/15: 9am in Lido 3006
Please give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!
DAT 103