Post on 07-Mar-2015
description
transcript
Big Data Analytics Platform
Beyond Traditional Enterprise Data Warehouse
1Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=45
2
Outline
What Is A Traditional Enterprise Data Warehouse? What Is Required From A Big Data Warehouse? Building Big Data Analytics Platform How To Re-use Existing Investments? Real-world Examples
Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=45
3
The Answers We Seek
3
The kind of customer who will
spend most with us next year ?The kind of customer who will
spend most with us next year ?
What kind of products my customers are interested in ?
What kind of products my customers are interested in ?
Customers that we are
likely to lose ?Customers that we are
likely to lose ?How much does my service impact my margin?
How much does my service impact my margin?
In which area should we open our new store next year?
In which area should we open our new store next year?
What is the most effective
Distribution channel?What is the most effective
Distribution channel?
Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=45
Traditional EDW
4Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=45
5
EDW Components
Extraction, Transformation and Loading - ETL Data is extracted from a heterogeneous data sources Transformed to match the data warehouse schema Loaded into the data warehouse database
Analyze and Query - OLAP Tools Active analysis - user queries User guided data analysis OLAP
Automated Analysis - Data Mining Machine learning / NLP Recommendations & forecasting
Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=45
6
Enter Big Data
Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=45
7
The Gap Area- Big Data v/s EDW
Large data volumes Complex unstructured data Deeper insights Storing images, videos The bottom-line - $/TB
Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=45
Big Data in EDW
8Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=45
9
Key Characteristics - Big Data Platform
Highly scalable
Works on massive data sets
Support for multiple data sources
Easy deployment/ seamless integration
Deep analytics
Canned & customized reports as well as valuable BI
Support for real time analytics
Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=45
Building Big Data Analytics Platform
10Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=45
Building Big Data Analytics Platform
Commercial
Teradata/ Netezza
Greenplum/ Vertica/ Aster
Informatica
SAS/ Microstrategy/
Business Objects
Pentaho/ Jasper
Open source Hybrid
Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=45
Building Big Data Analytics Platform
Commercial
Teradata/ Netezza
Greenplum/ Vertica/ Aster
Informatica
SAS/ Microstrategy/
Business Objects
Pentaho/ Jasper
Open source
CloverETL/ Kettle/ Talend
Jaspersoft/ Pentaho
Reporting
Hadoop• Apache• Cassandra
Hybrid
Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=45
Building Big Data Analytics Platform
Commercial
Teradata/ Netezza
Greenplum/ Vertica/ Aster
Informatica
SAS/ Microstrategy/
Business Objects
Pentaho/ Jasper
Open source
CloverETL/ Kettle/ Talend
Jaspersoft/ Pentaho
Reporting
Hadoop• Apache• Cassandra
Hybrid
ETL - Open Source and Commercial
Analytics - Open Source or
Commercial
Commercial Hadoop Versions
14
Our Key Learnings
Open source yields better results for larger volumes of data
Parallel processing or faster mechanisms can be used for import/export of data
Real time is a myth in big data – needs careful design
Hadoop is the most cost effective option for big data
Reuse of existing EDW investments possible
Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=45
Impetus Big Data Analytics Platform- iLaDaP
15Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=45
iLadap- Technologies Used
Plug and Play Service Oriented Architecture
Workflow and ETL
Underlying PB Scale Store
BI and Analytics Query Engine
Real Time Analytics
Application Integration/ Development
16
17
Reusing EDW Investments
• Infrastructure
• Code – logic and algorithm
• Traditional data warehouse
• RDBMS engine
• Reporting tools
• ETL tools
• Development and testing strategy
Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=45
Case Study - 1
The Client Leaders in internet services and media in Europe
Key Challenge Very high volumes of data recorded each month Near real time reporting engine needed How much infrastructure needed?
Impetus Solution Proposed Cloud for POC Usage of Flume for collecting streaming data Usage of Hbase/Hive for analysis
Benefits Realised
• Highly scalable• Near real time analytics
Web Analytics
19
Case Study - 2
The Client One of the key players in Telecom industry
Key Challenge CDR Data Conversion Customer churn analysis
Impetus Solution Workflow based CDR data conversion Canned reports for CDR data Used Intellicus to generate customer churn analysis reports
Benefits Realised
• Predefined canned reports for customer churn analysis• Better customer management
Case Study - 3
The Client Leading online product retailer
Key Challenge Recommendation engine Cross product customer analysis Provide ‘Big Picture’ across business units
Impetus Solution Proposed iLaDaP based solution Apache Mahout based recommendation engine Clickstream, Server log and OLTP cross analysis
Benefits Realised
• Better product recommendations• True centralized business overview across product and business lines
Summing up…
22
Big Data Analytics needs a well-thought of strategy Any single vendor technology may not be sufficient to build a Big
Data Analytics Platform Hybrid solutions are effective due to their flexible cost model Selecting the right tools is the key to build a successful Big Data
Analytics Platform Easy extension of the existing EDW infrastructure possible
Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=45
Impetus Technologies
We offer innovative product engineering
and technology R&D services
23Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=45
Questions
Please send in your questions
using the chat panel
24Recorded version available at
http://www.impetus.com/webinar_registration?event=archived&eid=45
Thank you
Mail us at inquiry@impetus.comor visit bigdata.impetus.com
Recorded version available at http://www.impetus.com/webinar_registration?event=archived&eid=45