+ All Categories
Home > Technology > Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People (...

Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People (...

Date post: 01-Dec-2014
Category:
Upload: emc-academic-alliance
View: 1,775 times
Download: 1 times
Share this document with a friend
Description:
An examination of the trends of Big Data and Advanced Analytics as well as the technology, services and education needed to thrive in this new field. This session explores examples of true industry-disruptive analytics-driven transformations and the catalysts for transformation. Examining the role of people is paramount to success in order to develop a high-performing data scientist team - starting today.
38
1 © Copyright 2012 EMC Corporation. All rights reserved. Disruptive Data Science Annika Jimenez David Dietrich
Transcript
Page 1: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

1 © Copyright 2012 EMC Corporation. All rights reserved.

Disruptive Data Science

Annika Jimenez David Dietrich

Page 2: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

2 © Copyright 2012 EMC Corporation. All rights reserved.

Agenda

CareCore National’s Evolution

Data-Driven Transformations

EMC’s Assist

Page 3: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

3 © Copyright 2012 EMC Corporation. All rights reserved.

CareCore’s Role in Health Care

Clinics, Hospitals,

HC Centers,

Providers

Insurance Companies (Carriers)

• Providers’ Performance • Fraud Detection • Reduced Liability • Sources of Variability • Industry Statistics

• Optimized Compensation • Legal Actions • Accurate Payments

• Cost-Effective Treatment • Knowledge Dissemination • Reduced Liability • Sources of Variability • Industries Statistics • Providers’ Performance

Patients (Members)

• Optimized Price • Legal Action • Customized Plans • Better Services

• Better Selection of Providers • Better Selection of Clinics • Improved Expenses Planning

• Industry’s Statistics • Clinic’s Performance • Providers’ Performance

• Claims • Diagnosis • Treatments • Procedures • Cost • Patient’s Profiles

Plan Customers

• Employees Analysis • Industry’s Statistics • Potential of Savings

• Legal Action • Optimal Site Redirection

• Improved Benefits and Value

• Better Services

• Legal Action • Improved Employee’s Satisfaction

• Better prices and options

Page 4: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

4 © Copyright 2012 EMC Corporation. All rights reserved.

Pathway Traversal Prediction

Patient Profile

Provider Profile

Plan Details

Claims History

Prior Pathway

Traversals

Patient Provider CPT Plan

Pathway Traversals

Predicted

Actual

Page 5: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

5 © Copyright 2012 EMC Corporation. All rights reserved.

95% CI

Predictive Modeling In the Call Center

Patient profile

Provider profile

Plan details

Actual traversal

Predicted traversal

Traversal anomaly: Triage the call

Page 6: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

6 © Copyright 2012 EMC Corporation. All rights reserved.

CareCore National Analytics Phases

Data Value

1.0 CCN Analytics

2.0 Analytics as a Service

3.0 AaaS @ Scale

1.5 CCN Analytics (BI + Predictive)

• Greenplum Platform • Data Source = CCN+ • BI-Centric • Operationally Oriented • Customer is CCN

• New Customer Orientation • New Solutions Development • Data Source = Carrier+ • 2.0 Batch 2.5 Data Exchange

• Data Exchange • Workflow Management

Tim

e

Customer = External Customer = CCN

Page 7: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

7 © Copyright 2012 EMC Corporation. All rights reserved.

Big Data

• Large Volumes

• New Sources

• Low Latencies

Key Characteristics

• New Platforms

• New Roles

• New Techniques

Implications for the Enterprise

Page 8: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

8 © Copyright 2012 EMC Corporation. All rights reserved.

High

Future Past TIME

BUSINESS VALUE

Business Intelligence

Predictive Analytics & Data Mining (Data Science)

Typical Techniques & Data Types

• Optimization, predictive modeling, forecasting, statistical analysis

• Structured/unstructured data, many types of sources, very large data sets

Common Questions

• What if…..? • What’s the optimal scenario for our

business ? • What will happen next? What if these

trends continue? Why is this happening?

Business Intelligence

Typical Techniques & Data Types

• Standard and ad hoc reporting, dashboards, alerts, queries, details on demand

• Structured data, traditional sources, manageable data sets

Common Questions

• What happened last quarter? • How many did we sell? • Where is the problem? In which

situations?

Data Science

Low

Big Data Requires New Approaches to Analytics Data Science & Big Data Analytics

Page 9: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

9 © Copyright 2012 EMC Corporation. All rights reserved.

Industries Are Broadly Embracing Data Science

Retail

•CRM – Customer Scoring

•Store Siting and Layout

•Fraud Detection / Prevention

•Supply Chain Optimization

Advertising & Public Relations

•Demand Signaling

•Ad Targeting

•Sentiment Analysis

•Customer Acquisition

Financial Services

•Algorithmic Trading

•Risk Analysis

•Fraud Detection

•Portfolio Analysis

Media & Telecommunications

•Network Optimization

•Customer Scoring

•Churn Prevention

•Fraud Prevention

Manufacturing

•Product Research

•Engineering Analytics

•Process & Quality Analysis

•Distribution Optimization

Energy

•Smart Grid

•Exploration

Government

•Market Governance

•Counter-Terrorism

•Econometrics

•Health Informatics

Healthcare & Life Sciences

•Pharmaco-Genomics

•Bio-Informatics

•Pharmaceutical Research

•Clinical Outcomes Research

Page 10: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

10 © Copyright 2012 EMC Corporation. All rights reserved.

Data-Driven Transformations

Page 11: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

11 © Copyright 2012 EMC Corporation. All rights reserved.

Transformation Catalysts

Time

Data Availability

Distributed Computing

Analytics Tools & Platform

DS Skill Availability

Org Alignment

Process & Change Mgt

Page 12: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

12 © Copyright 2012 EMC Corporation. All rights reserved.

Transformation Catalysts

Time

Data Availability

Distributed Computing

Analytics Tools & Platform

DS Skill Availability

Org Alignment

Process & Change Mgt

Page 13: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

13 © Copyright 2012 EMC Corporation. All rights reserved.

Big Data Analytics Is Now Central to Enterprise Strategy

Big Data Analytics COO CMO

CIO

CFO CSO

CPO

CEO

• Unified User Profiling • Segmentation • Churn Prediction • Lifecycle Management • Purchase Funnel Analysis • Brand Analytics • Campaign Analytics

o Media Mix Modeling o SEM Optimization o Behavioral/Social

Targeting o Attribution o ROI Optimization o Social Effects

• Pricing • Demand Forecasting

• Online Behavioral Analyses:

• Other product-sourced data & behavioral analytics o Mobile devices o DVRs o Smart meters o Other electronics

• Vertically Specific Product Development o Genetic mining o Imaging o Oil/gas exploration

• Unauth. User Access Detection

• Web Server Attack Detection

• Malware Protection • Advanced Persistent

Threat Detection

• Fraud Detection • Error Log Analysis • Complaint Data Analysis • Call Center Data Analysis • Demand

Planning/Forecasting • Quality/ Reliability

Analysis • Fault/Service Failure

(Detection/Prediction)

• IT Log Analytics • Error/event Logs

Analytics • Network Analytics

• KPI Definition • Risk Modeling • Compliance

Page 14: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

14 © Copyright 2012 EMC Corporation. All rights reserved.

Transformation Catalysts

Time

Data Availability

Distributed Computing

Analytics Tools & Platform

DS Skill Availability

Org Alignment

Process & Change Mgt

Page 15: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

15 © Copyright 2012 EMC Corporation. All rights reserved.

Big Data Analytics Is Different from Traditional BI

“BIG DATA ANALYTICS”

“TRADITIONAL BI”

GBs to 10s of TBs

Operational

Structured

Repetitive

10s of TB to Pb’s

External + Operational

Mostly Semi-Structured

Experimental, Ad Hoc

Page 16: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

16 © Copyright 2012 EMC Corporation. All rights reserved.

Data Science Team as Your New Source of Innovation

Page 17: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

17 © Copyright 2012 EMC Corporation. All rights reserved.

Traditional Analytics Process

sample.csv

Time-to-Insights

Data Prep DB Extract DB Import spec.docx scores.csv

Not a Scalable Process!

Page 18: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

18 © Copyright 2012 EMC Corporation. All rights reserved.

Data Prep

Time-to-Insights

Analytics with Greenplum Marketing optimization with MADlib

> SELECT householdID, variables

FROM households

ORDER BY RANDOM()

LIMIT 100000;

> SELECT run_univariate_analysis (

'households_training',

'variables');

WHERE pvalue<.01 AND r2>.01;

> SELECT run_regression(

'univariate_results',

'households_training');

> SELECT householdID,

madlib.array_dot(

coef::REAL[],

xmatrix::REAL[])

FROM coefficients, households;

MADlib

Page 19: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

19 © Copyright 2012 EMC Corporation. All rights reserved.

People & Skills Three Key Roles of the New Data Ecosystem

Note: Figures above reflect a projected talent gap in US in 2018, as shown in McKinsey May 2011 article Big Data: The next frontier for innovation, competition, and productivity

Role

Deep Analytical Talent

Data Savvy Professionals

Technology & Data Enablers

Data Scientists Projected U.S. talent gap: 140,000 to 190,000 Projected U.S. talent gap: 1.5 million

Page 20: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

20 © Copyright 2012 EMC Corporation. All rights reserved.

Profile of a Data Scientist

Curious & Creative Technical

Quantitative

Communicative & Collaborative

Skeptical

Page 21: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

21 © Copyright 2012 EMC Corporation. All rights reserved.

Specific Data Science Skills & Traits

1

2

3

4

5

EDW

Apply data science methods in their current roles

Page 22: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

22 © Copyright 2012 EMC Corporation. All rights reserved.

The Greenplum Data Science Team

Senior Director, Data & Insights Services, Yahoo! (MIA, UCSD)

Principle Scientist at RSA, Fraud Detection, Speech and Language Processing (M.S. in Signal Processing)

Marketing optimization (Ph.D. in Operations Research)

Stochastic machine learning (Ph.D., Australian National University)

Director of Analytics at M-Factor, DemandTec (M.S., Berkeley)

Data Mining in Healthcare (Ph.D. and Postdoctoral Fellow, Australian National University)

• Biomedical Informatics (Ph.D., Stanford)

• Research Engineer at Fox Interactive Media, eHarmony (M.S. in Applied Mathematics)

• Quantitative modeling and risk management in trading and finance (Ph.D. in Economics, Princeton, M.S. in Mathematics, Courant Institute)

• Mechanical Engineering (Ph.D., Stanford)

• Statistician, Bayesian Analysis (M.S., Statistics, Stanford)

Page 23: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

23 © Copyright 2012 EMC Corporation. All rights reserved.

Biggest Obstacles to Data Science Adoption

Q25 : The biggest obstacle to Data Science adoption in our organization is: (Coded for Total)

Page 24: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

24 © Copyright 2012 EMC Corporation. All rights reserved.

The Question of Org

Central Model

Decentralized Model

• Data Science sits within IT • Close proximity to data platform • Cross-initiative scalability • Establishment of best practices • Jump-start ―data driven‖ culture • Centrally defined prioritization • Lack of domain expertise

• Data Science sits with LOB • Distanced from data platform • Limited scalability across org • Lack of best practices • Slower ―data driven‖ culture • LOB defined prioritization • Strong domain expertise

A B

Page 25: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

25 © Copyright 2012 EMC Corporation. All rights reserved.

The Question of Org

Central Model

Decentralized Model

A B

Hybrid Model

• Data Science in IT with physical LOB placement

• Close proximity to data platform • Cross-initiative scalability • Establishment of best practices • Jump-start ―data driven‖ culture • LOB driven prioritization • Strong domain expertise

Page 26: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

26 © Copyright 2012 EMC Corporation. All rights reserved.

Process & Change Management

Project Workspaces

Data Analysis

Publish and

Iterate

Explore the

Data

Collaboration

Translate Vision into Concrete

Projects

Prioritize with Stakeholders

Roadmap and Resource

Solidify Integration Points with

Dependencies

Socialize Progress

Deliver Wins!

Page 27: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

27 © Copyright 2012 EMC Corporation. All rights reserved.

New Roles Required Beyond Core DS Team

+ Analytics Executive – ―Chief Analytics Officer‖?

+ Engagement Managers

Page 28: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

28 © Copyright 2012 EMC Corporation. All rights reserved.

EMC’s Assist

Page 29: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

29 © Copyright 2012 EMC Corporation. All rights reserved.

Greenplum Unified Analytic Platform

Agile Big Data Analytics

Rich Capabilities for Complex Analytics

Extends Leading Tools

Fully Customizable for Analytics

Increases Analytical Productivity & Results

Extends Insight By Combining All Data

Augmented by the Greenplum Data Science Team

Developed, Packaged, Supported by EMC

Page 30: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

30 © Copyright 2012 EMC Corporation. All rights reserved.

Analytics Solutions: Goals

Overcome the analytics gap

Generate continuous insights by developing re-usable models on massive data sets

Produce actionable, ready-to-deploy models

Build collaborative relationships among data stakeholders

Educate users on the development of tools and best practices

Establish a strategic vision for on-going analytics development

GREENPLUM ANALYTICS LABS

Page 31: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

31 © Copyright 2012 EMC Corporation. All rights reserved.

Analytics Lab: Packages

LAB PRIMER (1-Day Workshop)

• Analytics Roadmap

• Prioritized Opportunities

• Architectural Recommendations

LAB 600 (6-Week Lab)

• Analytics Roadmap

• Prof. services on GPDB*

• Ready-to-deploy model(s)

LAB 1200 (12-WEEK LAB)

• Analytics Roadmap

• Prof. services on GPDB*

• Ready-to-deploy model(s)

LAB 100 (Analytics Bundle)

• On-site MPP Analytics Training

• Analytics tool-kit

• Quick insight (2 weeks)

*GPDB priced separately

GREENPLUM ANALYTICS LABS

Page 32: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

32 © Copyright 2012 EMC Corporation. All rights reserved.

Skills Matrix, Based on Recent Students

Technical Ability

Recent STEM Grads

Business Intelligence

Professionals, IT

Quantitative Analysts, Statisticians,

Business and data analysts

Quantitative Skills

Data Scientists

Page 33: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

33 © Copyright 2012 EMC Corporation. All rights reserved.

Data Science and Big Data Analytics Course and EMCDSA Certification

Course Overview

• ―Open‖ curriculum

• Practitioner’s approach

• Enables immediate

participation on analytics

projects

• Prepares for EMC Proven

Professional Data Science

Associate (EMCDSA)

Certification

Details

Page 34: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

34 © Copyright 2012 EMC Corporation. All rights reserved.

Summary – Transformation Success Criteria

Establish a clear vision for the role of Big Data Analytics

Understand end-to-end platform dependencies

Embrace the UAP paradigm

Educate & build your Data Science Dream Team

Organize to your contextual reality

Initiate smart process

Deliver one concrete ―win‖

Socialize, socialize, socialize

Page 35: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

35 © Copyright 2012 EMC Corporation. All rights reserved.

Q & A

Page 36: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

36 © Copyright 2012 EMC Corporation. All rights reserved.

Other Relevant Greenplum Sessions

Session Presenter Times Unified Analytics Platform Introduction Brian Wilson Tues 10:00-11:00 Thurs 1:00-2:00

Greenplum Database Overview Michael Crutcher Mon 8:30-9:30 Wed 10:00-11:00

Greenplum Hadoop Overview Susheel Kaushik Mon 10:00-11:00 Wed 4:15-5:15

Greenplum DCA Overview Hanxi Chen Mon 4:00-5:00 Thurs 10:00-11:00

Greenplum Analytics Workbench Apurva Desai Wed 8:30-9:30 Thurs 10:00-11:00

Analytics on Hadoop Don Miner Tues 11:30-12:30 Thurs 8:30-9:30

Optimizing Greenplum Database on VMware Virtualized Infrastructure

Kevin O’Leary Mon 4:00-5:00 Tues 4:15-5:15

Big Data Driven Businesses in Action: Creating Real Business Value Using Greenplum UAP (Panel w/4 Customers)

Mike Maxey Wed 4:15-5:15 Thurs 11:30-12:30

Analytics for Business Value: Collaboration Josh Klahr Mon 10:00-11:00 Wed 2:45-3:45

Disruptive Data Science — How Data Science and Big Data are Transforming Business, IT and People

Annika Jimenez David Dietrich

Tues 4:15-5:15 Thurs 11:30-12:30

Page 37: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

37 © Copyright 2012 EMC Corporation. All rights reserved.

Thank You

Page 38: Disruptive Data Science - How Data Science and Big Data are Transforming Business, IT and People  ( EMC World 2012 )

Recommended