+ All Categories
Home > Documents > Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics –...

Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics –...

Date post: 27-May-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
25
Big Data Big Data Analytics Analytics DAMA NY DAMA Day October 17, 2013 IBM 590 Madison Avenue 12th floor New York, NY Tom Haughey InfoModel, LLC 868 Woodfield Road Franklin Lakes, NJ 07417 201 755 3350 [email protected]
Transcript
Page 1: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

Big Data Big Data AnalyticsAnalytics

DAMA NY DAMA DayOctober 17, 2013

IBM590 Madison Avenue 12th floor

New York, NY

Tom Haughey

InfoModel, LLC868 Woodfield Road

Franklin Lakes, NJ 07417201 755 3350

[email protected]

Page 2: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 2

AgendaAgenda• Definition• Types of data

– Structured– Semi-structured– Unstructured

• Why Big Data is important• Sources of Big Data • Levels of Big Data• Use cases for Big Data• Big Data analytics

– Data mining– Predictive analysis

• NOSQL and Big Data Landscape• The new business intelligence

architecture• How to prepare for Big Data• Pitfalls • Conclusions

Page 3: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 3

Big Data DefinitionBig Data Definition• Big data consists of high-volume, high-velocity, high-variety and high value

data and processes that demand cost-effective, innovative forms of information processing for enhanced insight and decision making

Source: modified from Gartner Glossary

Page 4: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 4

Big Data in the PastBig Data in the Past• A decade ago, Big Data was:

– A scalability problem – A performance problem

• Added to that was the difficulty of making sense of it• That is where today’s Big Data and Big Data Analytics come into play

Page 5: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 5

Why Now?Why Now?• Why can we achieve this now?• The four-minute mile syndrome

– Nobody could do it till Roger Bannister did it– Now lots of us can do it (!)

• Before we didn’t have:– The hardware technology– The software systems– The data management systems– The thought processes

Page 6: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 6

Sources of Big DataSources of Big Data

•Streaming data (e.g., stock market)

•Video archives•Large-scale e-commerce •Social and professional networks•Internet text and documents•Internet search indexing•Call detail records•Web logs•RFID•Medical records•Sensor networks•Social networks•Military surveillance•Astronomy•Video and music archives•Atmospheric science•Genomics, biogeochemical •Biological & other complex data•Interdisciplinary scientific research

Page 7: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 7

Big Data Support TechnologiesBig Data Support Technologies• The emergence of commodity servers• NOSQL file management systems and Hadoop• Inverted column databases • In memory databases and analytics • Convergence of machine learning and data mining• Management of structured and unstructured content • Support of Hadoop, Map:Reduce by major RDBMS vendors

Page 8: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 8

Types of DataTypes of Data• Structured – having a fixed and external structure (external to the data

structure itself)

• Semi-structured – having a structure imbedded in the data. Instances contain data values and metadata. The structure may vary but still needs to be planned and modeled

• Unstructured – having no known structure. Often transformed to structured or semi-structured data for processing. The structure may vary but stillneeds to be planned and modeled

Page 9: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 9

Big Data and Business AnalyticsBig Data and Business Analytics

Roll Your Own

Unknown, unstructured or semi-structured. Processed directly or at small to

large scale

•These levels affect data structure, data access and scale

Unknown, unstructured. Transformed to

structured. Processed at small to large scale

Known, structured. Processed at small to

large scale.Level 1

Level 2

Level 3

Level 4

Adapted from McKinsey

Page 10: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 10

Sample Use Cases by Data LevelSample Use Cases by Data Level• Level 1:

– Pricing: targeted price setting– Campaign lead generation– Customer experience– Pricing based on customer value

• Level 2:– Impact of marketing on sales– Market basket to determine risk– Next product to buy (NPTB)– Cross channel integration

• Level 3:– Fraud prevention– Discount targeting based on location, likelihood-to-leave, web analytics

• Level 4:– Targeted advertising, right landing page– Pricing and targeted advertising, right price and landing page– Credit line management

Adapted from McKinsey

Page 11: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 11

Business AnalyticsBusiness Analytics• Solutions used to build analytical, historical models and simulations to

create scenarios, understand current status and predict future states• Business analytics includes:

– Data mining, predictive analytics, applied analytics and statistics, and is delivered as an application suitable for a business user.

– Big Data Analytics is the convergence of Big Data and Business Analytics

Big Data Big DataAnalytics

BusinessAnalytics

• Without Big Data Analytics, big data is “just a lot of data”

Page 12: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 12

Should You Kill Your Data Warehouse?Should You Kill Your Data Warehouse?

See Forbes, 8/24/2011 [Maybe don’t see !]

Page 13: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 13

Try This QueryTry This Query• Try this query on semi-structured or unstructured data on NOSQL or other

multi-structured data environment

– “Give me a breakdown of sales revenue and volume by household by month, order it by the org unit that sold the product and the org unit that owned the product, summarize it from product type, to product subgroup and product group, for the last 5 years”

• In a DW containing this data, this query can be run efficiently and fairly easily coded in SQL

• How do you do this on enormous quantities of semi-structured or unstructured data using existing technologies?

Page 14: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 14

Forms of AnalyticsForms of Analytics• Traditional BI and OLAP

– Will stay– Consumers already use these– Consumers will add to them

• Big Data Analytics– Discovery oriented – Shows value in Big Data – Can leverage new platforms:

e.g., Analytics DB – Undergoing strong

acceptance by consumer

• Traditional BI and OLAP– Well known and required. – Works well with most EDWs. – Many levels and styles of BI

• Advanced SQL – Well-known SQL-based tools/ techniques. – Can result in long, complex SQL statements

to gather, aggregate and model data• Predictive Analytics

– Data mining/statistics to understand the past and predict future events.

– Requires special tools and rock stars.• New Analytic Methods/Tools

– Visualization, artificial intelligence, natural language processing.

– Analytical DB functions: inverted column DBs, DW appliances, MapReduce, etc.

Page 15: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 15

Data MiningData Mining• The use of mathematical algorithms to find hidden relationships in the data• It can be used to:

– Find rules or approaches that worked well in the past– Identify dependencies or relationships between things – Segment or classify customers based on how well they match

something you care about– Group and cluster things that are similar to each other – Spot and identify anomalies buried in the data

Text Source: James Taylor

Page 16: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 16

Techniques Used to Mine the DataTechniques Used to Mine the Data• Just as the popularity of new tools is exploding, so are the capabilities in

data mining • Data-mining techniques fall into four major categories:

– Classification – such as targeted marketing– Association – such as market basket analysis– Sequencing – those who bought this bought that– Clustering – developing conclusions using space and distance

• NOTE: In Hadoop, querying and mining can be done through Hive, Mahout and Pig

Page 17: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 17

Predictive AnalyticsPredictive Analytics• Applys mathematical techniques to

historical data to build a future analytic model.

• It predicts:– How likely something is to be true– Its likely value– The likely sequence

• For instance, instead of: – Finding dependencies true in

historical data, find dependencies likely to be true in the future

– Grouping customers based on historical similarities, group them on likelihood that they will behave similarly in the future

• Some regard data mining as the first step in predictive analytics

• Some use the terms synonymouslySource: James Taylor

Page 18: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 18

Uses of Predictive AnalysisUses of Predictive Analysis• Its major uses are to:

– Improve efficiency– Reduce risk – Increase profitability

• Examples:– First case: Professional sports

• “Moneyball”• Who should guard LeBron?• What are individual players really worth to the team?

– Second case: banking• Customers are using a new free business checking system for

personal checks as well, increasing the cost of those accounts• Will it be more profitable to pay them to leave

– Third case: • 7% of customers account for 43% of revenue• What should we offer them?

Page 19: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 19

Who Uses Big DataWho Uses Big Data• Data Scientists / Data Teams• Knowledge workers• BI consumers• Decision makers at all levels of the business

Page 20: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 20

Page 21: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 21

Big Data AnalyticsComplex analysis of structured dataAnalysis of irregularly structured data in HadoopSocial sentiment and social network analysis

Enterprise Data Warehouse EnvironmentTraditional Reporting and Analysis

Appliance HADOOPNOSQLDW Mart

Data Integration

Files Cloud

RYO data

Web LogsOLAPTables

Consumers

DocsSensors Events XML/JSON

Big DataTraditional Data Warehouse Environment

RDBMSs

Streams

Real-time Analytics

The Big Data Analytics EnvironmentThe Big Data Analytics Environment

Page 22: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 22

Means to Achieving Value in Big DataMeans to Achieving Value in Big Data• Create integrated, analytic sandboxes • Use Hadoop is a complement to previous systems, not a replacement• Derive data from Big Data as it is needed

– Less emphasis on pre-aggregation and pre-summarization– As has been said since the opening days of client-server, send the

function to the data– Not the data to the function (as in some vertical DBMSs today)– Learn to use parallel, distributed, commodity servers

• Use Big Data for staging and well as a live archive• Virtualize Big Data

– For reuse across multiple analytical applications– For easy access to the data when it is needed

Page 23: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 23

Preparing for Big Data (BD)Preparing for Big Data (BD)• Define the business objectives

– Big Data (BD) will yield business advantages – But not without business involvement

• Understand and prepare the data for BD as in any environment– It is NOT just about slamming the data to some humongous staging area– Data modeling is here to stay, but new methods are needed– Costs and technology frustrations will increase– But business advantages will go up as well

• Get the right staff– Both BD and Analytics are new skills – Organizations will need to hire, train, and learn accordingly

• Source the right suppliers and technology– BD Analytics will be mainstream; not just for giant web firms – Tools and platforms will improve so there will be less coding– Plus improvement in scalability, performance, real time availability– Expect Hadoop and other Big Data infrastructure to become common

• Hadoop will not replace anything • Data Warehousing and BI will continue

Page 24: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 24

Pitfalls Pitfalls • Potential pitfalls that can trip up organizations on big data analytics

initiatives include:– Absence of clear business purpose– Jettisoning data management principles and practices – Absence of internal analytics skills (you need rock stars) – The high cost of hiring experienced analytics professionals– High costs of the new infrastructure (hardware and software)– Challenges in integrating Hadoop systems and data warehouses– Selecting the right vendors who offer software connectors across and to

Big Data technologies

Page 25: Big Data Analytics v3 - DAMA NY– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive

© InfoModel, LLC. 2013 25

ConclusionsConclusions• Big Data must deliver Business Value

– That is the “sine qua non” of Big Data Analytics – Reporting, analysis and OLAP will stay – You also need “discovery” analytics, predictive analysis and data mining

• Plan your entry into big data and implement it in sensible increments– Be clear up front on the business goals– Select key sources (data from Web, other systems, social networks)

• You will have to make some upgrades:– Add new BI/DW technologies– Train your staff – Change is inevitable

• Give the business what it needs – Discovery analytics to understand change, find opportunities – Broader, more complete views of all relevant entities (e.g., customer)– Analytics targeting your industry and your organization’s specific needs

and unique collection of big data


Recommended