Post on 03-Jul-2015
transcript
Data Science A Practitioner’s Perspective
Mass Technology Leadership Council Panel Discussion
David Menninger, Formerly VP & Research Director, Ventana Research
David.Menninger@emc.com
©2012, Ventana Research
David Menninger
Former Vice President – Ventana Research
Now head of business development and strategy for EMC Greenplum.
Until last week, covered analytics, business intelligence and information management for Ventana Research. Over two decades of experience developing and bringing to market some of the leading edge technologies for helping organizations analyze data to support a range of action-taking and decision-making processes.
Prior to joining Ventana Research, served as VP of Marketing and Product Management at Vertica Systems, Oracle, Applix, InforSenseand IRI Software. Helped create over three quarter billion dollars of shareholder value while serving in these roles.
Email: david.menninger@emc.com
©2011, Ventana Research, Inc.
2
Some Recent Relevant Research
Volume and Velocity of Data Are Most
Important In Evaluating Big Data Technology
4
10%
29%
31%
13%
11%
7%
less than 1 TB
1-10 TB
11-100 TB
101 TB-1 PB
more than 1 PB
Don't know
0% 10% 20% 30% 40%
26%
33%
20%
4%
6%
12%
less than 10 GB per day
11-100 GB per day
101 GB-1 TB per day
1-10 TB per day
More than 10 TB per …
Don't know
0% 10% 20% 30% 40%
Source: Ventana Research The Challenge of Big Data Benchmark Research
©2012, Ventana Research
Hadoop Is Being Adopted or Considered
by 54% of Enterprises
©2011, Ventana Research, Inc.
5
22%
15%
17%
Production
Planned
Evaluating
Source: Ventana Research Hadoop Information Management Analytics Research
…but the Vast Majority Use a Variety of
Big Data Technologies
6
89%
70%
34%
33%
22%
26%
15%
2%
7%
11%
13%
12%
4%
9%
2%
1%
3%
4%
3%
4%
5%
3%
4%
21%
17%
17%
10%
19%
3%
18%
31%
33%
45%
57%
51%
An RDBMS (for example, IBM
DB2, Microsoft SQLServer, MySQL, Oracle) on
standard hardware
Flat files
A DW appliance (for example
, Netezza, Exadata, EMC Greenplum, Teradata)
In-memory databases
Hadoop
Other
A specialized DBMS (for
example, Aster Data, Infobright, Kognitio, Parac
cel, SybaseIQ, Vertica)
Currently in production Plan to use within 12 monthsPlan to use in 12-24 months Still evaluatingNo plans to use
Source: Ventana Research The Challenge of Big Data Benchmark Research
©2012, Ventana Research
What Types of Applications?
©2011, Ventana Research, Inc.
7
What types of large-scale data applications is your
organization running?
60%
63%
65%
56%
69%
46%
44%
89%
71%
68%
60%
47%
32%
32%
Query and reporting
Consolidation of multiple data sources for analysis
Custom/production
application
Data preparation
Advanced analyses
Analysis or indexing
of unstructured data
Data sandbox/
Data experimentation
Hadoop
Non-Hadoop
Hadoop is most often
used for advanced
analyses and is more
likely to be used to
analyze unstructured
data and for data
sandboxing than other
technologies. It is less
likely to be used for
query and reporting.
Source: Ventana Research Hadoop Information Management Analytics Research
Predictive Analytics Still Emerging
Despite its potential, predictive analytics remain a
specialist tool, ranking 10th among BI capabilities with
only 13% using them
©2012, Ventana Research
Spreadsheets
Business Intelligence
Analytic Databases
Custom-built systems
Data warehouse
Planning and forecasting
Application server
LOB analytics
RDB
Predictive Analytics 13%
60%
49%
41%
34%
28%
26%
20%
18%
14% … yet 80% ranked predictive analytics
capabilities as important or very important
8
Source: Ventana Research Business Analytics Benchmark Research
Forecasting and Marketing are the Most
Common Uses of Predictive Analytics
72%
70%
45%
43%
34%
28%
27%
26%
18%
17%
17%
16%
9%
17%
24%
22%
34%
22%
31%
28%
38%
27%
34%
36%
27%
29%
33%
24%
Forecasting …
Marketing analyses …
Customer service or support …
Product recommendations or offers
Fraud detection
Intelligence or surveillance analysis
Social network analysis
Logistics analysis
Predicting product development …
Predicting prices in the supply chain
Scientific or clinical research
Healthcare decisions
Predicting mechanical failures
Other
Current
Future
©2012, Ventana Research
9
Source: Ventana Research Predictive Analytics Benchmark Research
Organizations Employ a Variety of Predictive
Analytics Algorithms
Classification and regression trees / decision trees and Linear
Regression are the most popular predictive analytics techniques used.
©2012, Ventana Research
10
69%
66%
61%
49%
36%
30%
30%
22%
21%
20%
15%
13%
25%
33%
29%
37%
42%
36%
35%
43%
43%
23%
41%
47%
6%
10%
14%
21%
34%
35%
34%
36%
57%
44%
40%
Classification and
regression trees / …
Linear Regression
Logistic regression or
other discrete choice …
Association rules
K-nearest neighbors
Neural networks
Box
Jenkins, Autoregressive …
Exponential smoothing /
double exponential …
Naïve Bayes
Support vector machines
Survival analysis
Monte Carlo Simulations
Frequently Occasionally Not at all
Source: Ventana Research Predictive Analytics Benchmark Research
Who Designs and Deploys Predictive Analytics?
… but who should be performing these tasks?
©2012, Ventana Research
Q1811
Data Scientist /
Data Mining
Resources
32%
Bus. Intelligence /
Data Warehouse
Team
31%
Line-of-
Business
Analysts
19%
Source: Ventana Research Predictive Analytics Benchmark Research
Who Does the Best Job?
©2012, Ventana Research
12
70%
65%
59%
50% 55% 60% 65% 70% 75%
Specialized data scientist, statistical or data mining resources
Line of business analysts
Business intelligence and data warehouse team
Satisfaction vs. Project Team
Overall Average
Source: Ventana Research Predictive Analytics Benchmark Research
Real-Time Scoring of New Records
More than half
the organizations
perform real-time
scoring
infrequently or
not at all.
©2012, Ventana Research
Q2613
Regularly30%
Occasionally18%Infrequently
22%
Not at all30%
Source: Ventana Research Predictive Analytics Benchmark Research
Organizations Need More Timely Results
from Predictive Analytics
©2012, Ventana Research
14
88%
73%
47%
0% 20% 40% 60% 80% 100%
Regularly
Occasionally
Infrequentlyor Not at all
Satisfaction vs. Use of Real-time Scoring
Overall AverageSource: Ventana Research Predictive Analytics Benchmark Research
Frequency of Updating Predictive Models
©2012, Ventana Research
Q2715
Constantly12%
Hourly2%
Daily6%
Weekly11%
Monthly14%Quarterly
22%
Less often than
quarterly17%
Don't know16%
Most organizations
don’t update their
analytic models
frequently enough.
Nearly four in 10 update
their models quarterly or
less frequently.
Source: Ventana Research Predictive Analytics Benchmark Research
Organizations that Update Models More
Frequently Have Higher Satisfaction
©2011, Ventana Research
16
81%
74%
48%
0% 10% 20% 30% 40% 50% 60% 70% 80% 90%
At Least Daily
At least Monthly
Less Frequently
Satisfaction vs. Model Updates
Overall AverageSource: Ventana Research Predictive Analytics Benchmark Research
Most Organizations Are Not Providing
Adequate Support and Training
©2012, Ventana Research
17
44%
42%
39%
31%
24%
32%
33%
38%
39%
34%
24%
26%
23%
31%
42%
Training in Predictive analytics
concepts and techniques
Product training
Training in the application of
predictive analytics to business problems
Specialized consulting resources
(internal or external)
Help desk resources
Adequately Only somewhat adequately Inadequately
Source: Ventana Research Predictive Analytics Benchmark Research
What Types of Training and Support Are
Most Effective?
©2012, Ventana Research
18
Overall Average
89%
89%
86%
79%
77%
60% 65% 70% 75% 80% 85% 90% 95%
Training in Predictive analytics concepts and techniques
Help desk resources
Training in the application of predictive analytics to business problems
Product training
Specialized consulting resources (internal or external)
Satisfaction vs. Training and Support
Source: Ventana Research Predictive Analytics Benchmark Research
Data Science A Practitioner’s Perspective
Mass Technology Leadership Council Panel Discussion
David Menninger, Formerly VP & Research Director, Ventana Research
David.Menninger@emc.com
©2012, Ventana Research