AWS Webcast - Attunity Couchsurfing

Post on 02-Dec-2014

478 views 2 download

description

Learn how Couchsurfing transformed 3 months of data loading into 3 hours.

transcript

Big Data Solution Showcase

Learn how Couchsurfing transformed 3 months of data

loading into 3 hours

Here is the url for the webinar:

https://connect.awswebcasts.com/p2j3kh6nmx3/

Presenters today

Brad Helicher

Director of Cloud Business

Attunity

Chris Keyser

Partner Solution Architect

Amazon Web Services

Charlie Killian

Founder & Owner

Bytecode IO

Thiago Bruch

Systems Engineer

Attunity

Agenda

• Introduction to AWS and big data

services

• Attunity Cloudbeam: moving the data

that moves your business into the cloud

• Customer case study: Couchsurfing

• Attunity Cloudbeam for Amazon Redshift:

how does it work & demo

• Q & A

What is AWS

AWS Global Infrastructure

Application Services

Networking

Deployment & Administration

DatabaseStorageCompute

Technologies and techniques for working

productively with data, at any scale.

Big Data

Creating Value from Data Assets

Recommendations, Collective Intelligence

Machine Learning

Visualization

DashboardsBusiness Intelligence

Measuring Functionality and Services

Ad Hoc QueriesA/B Testing

Hypothesis Testing & Predictions

Statistical Analysis

Learning from Social Media Conversations

Sentiment Analysis

SOCIAL

BIG DATA

Machine Learning DashboardsBusiness Intelligence

Ad Hoc QueriesA/B Testing

Statistical Analysis

Sentiment Analysis

Big Data AWS Cloud

Potentially Massive Data Sets Massive, virtually unlimited capacity

Iterative, experimental style of data

manipulation and analysis

Iterative, experimental style of

infrastructure deployment/usage

Frequently not a steady-state

workload;

peaks and valleys

Efficient with highly variable

workloads

Time to results is key

Parallel compute clusters from single

data source

Hard to configure/manage

Managed services for data storage

and analysis

AWS Data Services

Data

Velocity

Variety

Volume

Structured, Unstructured, Text, Binary

Gigabytes, Terabytes, Petabytes

Millisecond, Second, Minute, Hour, Day

EC2EBS

Instance Storage

RedshiftRDS

SQL Stores

EMR

Hadoop

DynamoDB

NoSQL

Kinesis

Stream

Storage Services

S3Cloud

FrontGlacier

Elasticache

Caching

Data

Pipeline

Orchestrate

Storage Services – Object Store

Amazon S3

Designed for 99.999999999% durability

Stores anything

Lifecycle and Versioning

Fine Grained Access Control

Reduced Redundancy Storage

Fast, simple, petabyte-scale data warehousing for less than $1,000/TB/Year

Amazon Redshift

MySQL, Oracle, SQLServer, Postgres

Backup/Restore, High Availability

Push Button Scalability

Up to 3 TB and 30K IOPS

Amazon

RDS

Nokia: 50% Cost Savings with 2x Faster Queries

Hadoop Tools Improving Rapidly

On-demand, Flexible, Big Data Technologies

Cheaper and Faster

Redshift & Hadoop Price-

performance Advantage over

RDBMS

>50% platform cost savings>2x

faster queries Minimal DBA

support

Redshift, Hadoop, S3, EMR, Data Pipeline for ETL

Cost-effective for 10s of TB data sets

AMI-based Services

Internet Speed Report Authoring

Hypothesis testing vs. waterfall

http://aws.amazon.com/marketplace

Big Data Case Studies

Learn from other AWS customers

aws.amazon.com/solutions/case-studies/big-data

AWS Marketplace

AWS Online Software Store

aws.amazon.com/marketplace

Moving the data that moves your

business in the cloud.

Data Transfer throughout On-Premises,

Amazon RDS, S3, & EC2

Brad Helicher

Director of Cloud Business

Brad.Helicher@attunity.com

+1-954-946-2274, ext. 1105

Thiago Bruch

Services Support Engineer

We Move the Data that Moves Your Business

Data Load/Replication

Change Data Capture

Data Access/Federation

Release Automation

Managed File Transfer

Enterprise File Replication

On-Premises to Cloud

Cloud to On-Premises

Cloud to Cloud

Customer Support

Professional Services and

Consulting

17

Cloud & Data Transfer: Use Cases

18

» Amazon Redshift: » Data Loading for BI / Analytics

» Amazon RDS / DBs on EC2: » Migrations from legacy / on-prem systems to AWS

» System Cutovers

» Disaster Recovery (on-prem and across AWS regions)

» Amazon S3: » Content Availability

» Disaster Recovery

» Hadoop EMR

» Amazon EC2 (File Replication):» Migrations from legacy / on-prem systems to AWS

» System Cutovers

» Disaster Recovery (on-prem and across AWS regions)

www.attunity.com/cloud

1. Complexity

2. Takes too long

3. Costs too much

4. Not real-time

5. Lack of Developer Resources

19

Data Replication: Common Challenges

www.attunity.com/cloud

CloudBeam – Value Proposition

» Key Features:

» Move any data, anytime,

anywhere

» Accelerated transfers

» Quick & Easy Setup

» Intuitive Administration

» Extensive Automation

» Real-time Monitoring

» Affordable Pay-per-Use

20

» Business Benefits:

» Ensure Information

Availability across the

Enterprise and the Cloud

» Overcome Data Transfer

Bottlenecks

» Quick Time-to-Value

www.attunity.com/cloud

Attunity CloudBeam

Customer Success Story: Couchsurfing

High-Performance Information Availability Solutions. Made Radically Simple. 21

Who is Couchsurfing?

22

Enables a global network of travelers to exchange cultural

experiences via hosted stays and more.

10 millionMembers

120,000Cities

100,000Event Attendees

Who is Bytecode IO?

23

Data and system integrators specializing in

ETL, APIs, Data Warehousing and Big Data.

charlie.killian@bytecode.io • http://www.bytecode.io/

Problem #1

24

No data insight.

Millions of hosting requests sent.

But what does it mean?

Rely on data instead of mythology.

Problem #2

25

Too much data for current architecture.

Solution

26

Near real-time

ODS(Operational Data Store)

Benefits: Short-Term

27

ODSonline

Analyst

Benefit: Long-Term

28

Data

Warehouse

Technology Selection

29

• ODS

• Data Warehouse

• Data Integration

Data

Warehouse

ODSData

Integration

Amazon Redshift

30

Data

Warehouse

ODS

• No capital outlay

• Offered as a service

• Can be implemented with limited internal resources

• Little operational overhead

• Data sources already in AWS RDS MySQL

• Combine ODS and Data Warehouse

Data Integration Challenges

31

• Aggregate large amounts of disjointed data

• Map and convert data types

• Build destination tables

• Cleanse data

• Perform Change Data Capture (CDC)

• Limit ongoing maintenance

Simplification and the AWS Marketplace

32

• Manual development estimated at 3 months

• Development cycle needed to be shortened

• Wanted the whole process to be simplified

• Searched the AWS Marketplace for solutions

Attunity CloudBeam for Amazon Redshift

33

ODS(Amazon Redshift)

CloudBeam

• End-to-End database to Amazon Redshift loading

• Near real-time

• Incremental CDC

• Easy setup and configuration

• Simple job management

• Affordable

Attunity Solution

34

AWS Region

Amazon

Redshift

Amazon

S3

Amazon

RDS MySQLEC2 Machine

Windows Server

EC2 Machine - M3.Large

Windows Server

Attunity AMI (data pass through)

Full Stack Solution

35

Results

36

Relying on data instead of mythology.

3 months 3 hours

V

S

ATTUNITYMANUAL PROCESS

Data insight!

Attunity CloudBeam for Amazon Redshift

High-Performance Information Availability Solutions. Made Radically Simple. 37

Attunity CloudBeam for Amazon Redshift

Solutions for On-Prem Sources

38www.attunity.com/cloud

AWS Region

EC2 Machine

M3.Large

Attunity AMI

RedshiftS3

On-Premises

Source DB

» How it’s Offered:

» Available on-demand via the

Amazon Marketplace

» Licensing: Hourly or

Bring Your Own License

» Supported Source DBs:

» Oracle

» SQL Server

» MySQL… and many others

Attunity CloudBeam for Amazon Redshift

Solutions for Data Sources within AWS

39

AWS Region

RedshiftS3

Source DB

On EC2

EC2 Machine

Windows Server

Attunity Replicate Software

EC2 Machine - M3.Large

Windows Server

Attunity AMI (data pass through)

www.attunity.com/cloud

Attunity CloudBeam for Amazon Redshift

What’s Needed to Get Started?

40www.attunity.com/cloud

AWS Region

EC2 Machine

M3.Large

Attunity AMI

RedshiftS3

On-Premises

Source DB

» Windows 64-bit Server» Located in same local network as

source DB(s)

» Hosts ‘Attunity Replicate’ software

» Amazon Redshift Driver & Database

Clients installed

» Ports: 5746 & 5439

» Amazon S3 &

Amazon Redshift instances

» ‘Attunity CloudBeam for

Amazon Redshift’

» Via Amazon Marketplace

» Hourly or BYOL

41

Web-based Designer and

Management Console

Target

Database

Replication Server

In Memory Processing

Transform

Filter

Persistent Store

Source

Database

Transaction

Log

Bulk Reader

CDC

Bulk Loader

Stream Loader

Data / Metadata

Data / Metadata

Attunity CloudBeam - Replicate-for-Redshift

Method & Processes

www.attunity.com/cloud

Attunity CloudBeam for Amazon Redshift

How it Works

42www.attunity.com/cloud

AWS Region

EC2 Machine

M3.Large

Attunity AMI

RedshiftS3

On-Premises

Source DB

1. Attunity Replicate extracts data

from source database

2. Attunity Replicate applies any

transformations & filters

3. Data files generated

4. Optimized Transfer from Attunity

Replicate server to EC2 machine.

6. ‘Instruction Channel’ used to

execute COPY command.

Data loaded into Amazon Redshift.

7. Incremental Loading via CDC,

applied inside Redshift using merge commands

5. Data passes through, staging in

Amazon S3.

Attunity CloudBeam - Performance

» Optimized transfer protocol

» Data transfer technologies:

» Leverages Amazon multi-part transfers

» Concurrent Sessions / Transfers

» Compression

» Recoverability, Guaranteed Delivery

» SSL Encryption

» Performance Gains:

» 10-12x over standard copy

43www.attunity.com/cloud

Thank You

High-Performance Information Availability Solutions. Made Radically Simple. 44

On to the Demo …

Thiago Bruch

Senior Engineer

Get in touch with us…

45www.attunity.com/cloud

Attunity Contact:www.attunity.com/cloud

Learn.attunity.com/cloudbeam

Meet us at re:Invent!Attunity – Booth 639

Thank you!