+ All Categories
Home > Documents > Data Integration for Big Data

Data Integration for Big Data

Date post: 25-Feb-2016
Category:
Upload: overton
View: 65 times
Download: 0 times
Share this document with a friend
Description:
Data Integration for Big Data. Pierre Skowronski Prague le 23.04.2013. IT is struggling with the cost of Big Data. Growing data volume is quickly consuming capacity. Need to onboard, store, & process new types of data. High expense and lack of big data skills. - PowerPoint PPT Presentation
Popular Tags:
18
Data Integration for Big Data Pierre Skowronski Prague le 23.04.2013
Transcript
Page 1: Data Integration for Big Data

11

Data Integration for Big Data

Pierre SkowronskiPrague le 23.04.2013

Page 2: Data Integration for Big Data

2Informatica Corporation Confidential – Do Not Distribute

2

IT is struggling with the cost of Big Data

• Growing data volume is quickly consuming capacity

• Need to onboard, store, & process new types of data

• High expense and lack of big data skills

Page 3: Data Integration for Big Data

3Informatica Corporation Confidential – Do Not Distribute

3

Delivery: Innovate Faster With Big Data

(onboard, discover, operationalize)

Risk: Minimize Risk of New Technologies

(design once, deploy anywhere)

Cost: Lower Big Data Project Costs

(helps self-fund big data projects)

Prove the Value with Big Data Deliver Value Along the Way

Page 4: Data Integration for Big Data

4Informatica Corporation Confidential – Do Not Distribute

4

Page 5: Data Integration for Big Data

5Informatica Corporation Confidential – Do Not Distribute

5

INTRODUCING THE INFORMATICA POWERCENTER BIG DATA EDITION

Page 6: Data Integration for Big Data

6Informatica Corporation Confidential – Do Not Distribute

6

PowerCenter Big Data EditionLower Costs

Transactions,OLTP, OLAP

Social Media, Web Logs

Machine Device, Scientific

Documents and Emails

EDW

ODS

MDM

Traditional Grid

Optimize processing with low cost commodity hardware

Increase productivity up to 5X

Page 7: Data Integration for Big Data

7Informatica Corporation Confidential – Do Not Distribute

7

7

Hadoop complements Existing Infrastructureon low cost commodity hardware

7

Page 8: Data Integration for Big Data

8Informatica Corporation Confidential – Do Not Distribute

8

8

5 x better productivity for similar performance

8

Project domain Clustersize Processing

Compare to Expert Hand-coding

Finance 3 Cleanse, Transform, sort, group 40% faster than PIG

Extract, process, load 50% faster than PIGFinance 10 Extract, process, load 20% slower than PIG

In the worst, only 20% slower the hand-codingMostly, equal or faster

Inormatica 1 week vs hand-coding 5-6 weeks

Page 9: Data Integration for Big Data

9Informatica Corporation Confidential – Do Not Distribute

9

Traditional Grid

PowerCenter Big Data EditionMinimize Risk

Deploy On-Premise or in the Cloud

Pushdown to RDBMS or DW Appliance

Quickly staff projects with trained data integration experts

Design once and deploy anywhere

Page 10: Data Integration for Big Data

10Informatica Corporation Confidential – Do Not Distribute

10

10

Graphical Processing LogicTest on Native, Deploy on Hadoop

10

Partial records only

Separate partial records from completed records

Completed records only

Separate incomplete and complete partial records

Select incomplete partial records

Aggregate all completed and partial-completed records

Sort records by Calling number

Page 11: Data Integration for Big Data

11Informatica Corporation Confidential – Do Not Distribute

11

11

Run it simple on Hadoop

11

Choose execution environment

View hive query

Press Run

Page 12: Data Integration for Big Data

12Informatica Corporation Confidential – Do Not Distribute

12

Minimaize Risk with Informatica Partners and Certified Developer Community

Global Systems Integrators Informatica Developers

• 45,000+ developers in Informatica TechNet

• 3x more developers than any other vendor*

0

200

400

600

800

1000

1200

Ab InitioBusiness ObjectsIBMInformatica

9,000+ trained developers

* Source: U.S. resume search on dice.com, December 2008

People

AchievingOperationalEfficiency

With InformaticaBest

practices & reusability

Technology

Expertise & best

practices

Page 13: Data Integration for Big Data

13Informatica Corporation Confidential – Do Not Distribute

13

WHAT ARE CUSTOMERS DOING WITH INFORMATICA AND BIG DATA?

Page 14: Data Integration for Big Data

14Informatica Corporation Confidential – Do Not Distribute

14

The Challenge Data warehouse exploding with over 200TB of data. User activity generating up to 5 million queries a day impacting query performance

The Solution The Result

• Saved 100TBs of space over past 2 ½ years

• Reduced rearchitecture project from 6 months to 2 weeks

• Improved performance by 25%

• Return on investment in less than 6 months

Lower Costs of Big Data ProjectsSaved $20M + $2-3M On-going by Archiving & Optimization

ERP

CRM

Custom

Business Reports

EDW

Archived DataInteraction Data

Large Global Financial Institution

Phase 2

Page 15: Data Integration for Big Data

15Informatica Corporation Confidential – Do Not Distribute

15

Web Logs

Traditional Grid

Near Real-Time

The Challenge. Increasing demand for faster data driven decision making and analytics as data volumes and processing loads rapidly increase

The Solution The Result

• Cost-effectively scale performance

• Lower hardware costs• Increased agility by

standardizing on one data integration platform

RDBMS

RDBMS

RDBMS

Datamarts

Datamarts

DataWarehouse

Phase 2

Large Global Financial InstitutionLower Costs of Big Data Projects

Phase 2

Page 16: Data Integration for Big Data

16Informatica Corporation Confidential – Do Not Distribute

16

Large Government AgencyFlexible Architecture to Support Rapidly Changing Business Needs

The Challenge Data volumes growing at 3-5 times over the next 2-3 years

The Solution The Result• Manage data

integration and load of 10+ billion records from multiple disparate data sources

• Flexible data integration architecture to support changing business requirements in a heterogeneous data management environment

EDW

DW

DWMainframe

Dat

a Vi

rtua

lizat

ion

RDBMS

Unstructured Data

Business Reports

Traditional Grid

Phase 2

Phase 2

Page 17: Data Integration for Big Data

17Informatica Corporation Confidential – Do Not Distribute

17

17

Why PowerCenter Big Data Edition

• Repeatability• Predictable, repeatable deployments and methodology

• Reuse of existing assets• Apply existing integration logic to load data to/from Hadoop• Reuse existing data quality rules to validate Hadoop data

• Reuse of existing skills• Enable ETL developers to leverage the power of Hadoop

• Governance• Enforce and validate data security, data quality and regulatory policies• Manageability

17

Page 18: Data Integration for Big Data

18Informatica Corporation Confidential – Do Not Distribute

18


Recommended