+ All Categories
Home > Technology > Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

Date post: 06-Aug-2015
Category:
Upload: rajesh-nambiar
View: 795 times
Download: 1 times
Share this document with a friend
Popular Tags:
75
1 © Copyright 2015 EMC Corporation. All rights reserved. 1 © Copyright 2015 EMC Corporation. All rights reserved.
Transcript
Page 1: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

1 © Copyright 2015 EMC Corporation. All rights reserved. 1 © Copyright 2015 EMC Corporation. All rights reserved.

Page 2: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

2 © Copyright 2015 EMC Corporation. All rights reserved.

INTERNET OF THINGS (IOT) TRENDS & IMPLICATIONS FOR ENTERPRISE & DATA SCIENCE

2 © Copyright 2015 EMC Corporation. All rights reserved.

Page 3: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

3 © Copyright 2015 EMC Corporation. All rights reserved.

INTERNET OF THINGS (IOT) TRENDS & IMPLICATIONS FOR ENTERPRISE & DATA SCIENCE Rashmi Raghu, Principal Data Scientist Pivotal

Acknowledgments: Regunathan Radhakrishnan Kaushik Das

Page 4: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

4 © Copyright 2015 EMC Corporation. All rights reserved.

What can the Internet of Things do in the real world?

Page 5: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

5 © Copyright 2015 EMC Corporation. All rights reserved.

CAN WE PREVENT ACCIDENTS LIKE THE MACONDO DISASTER ?

Page 6: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

6 © Copyright 2015 EMC Corporation. All rights reserved.

HOW DO WE KNOW WHEN SOMEONE BECOMES LIKELY TO DEFAULT OR CHURN?

Page 7: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

7 © Copyright 2015 EMC Corporation. All rights reserved.

HOW DO WE KNOW A TREE HAS FALLEN ON A POWER LINE EVEN BEFORE THE

RESIDENTS COMPLAIN?

Page 8: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

8 © Copyright 2015 EMC Corporation. All rights reserved.

MASHING BIG DATA WITH BIG

MACHINES IS ‘BEAUTIFUL, DESIRABLE, INVESTABLE’

- IT COULD TRANSFORM GE'S BUSINESS - AND THE ECONOMY.

” JEFF IMMELT, CEO, GE

Page 9: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

9 © Copyright 2015 EMC Corporation. All rights reserved.

THE POWER OF 1

R X

Increasing Freight Utilization Rail

Predictive Maintenance Healthcare

Predictive Diagnostics Power

Driving Outcomes That Matter

One Percent Improvement Equals

$27B Industry Value by Reducing System

Inefficiency

$63B Industry Value by Reducing Process

Inefficiency

$66B Industry Value with

Efficiency Improvements In Gas-fired Power

Plant Fleets Source: General Electric

Page 10: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

10 © Copyright 2015 EMC Corporation. All rights reserved.

Creating A Smart System

Page 11: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

11 © Copyright 2015 EMC Corporation. All rights reserved.

How does a human react to an external event?

Page 12: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

12 © Copyright 2015 EMC Corporation. All rights reserved.

THE HUMAN BRAIN BRINGS IT ALL TOGETHER

The Brain: 1. Takes in the input from the eyes 2. Analyzes it to compute the trajectory of the ball 3. Tells the body what action to take to hit the ball

Page 13: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

13 © Copyright 2015 EMC Corporation. All rights reserved.

HOW CAN AN ORGANIZATION BE SMART?

Human Being Organization

• Sense Organs – eyes, ears … • Limbs … • Nervous System • Brain

• Sensors • Actuators • Network • Digital Brain

Page 14: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

14 © Copyright 2015 EMC Corporation. All rights reserved.

Sensors

Actuators

BUT WHERE IS THE BRAIN?

?

Page 15: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

15 © Copyright 2015 EMC Corporation. All rights reserved.

LET’S BUILD A DIGITAL BRAIN

The brain brings it all together 1. Takes in the input from a large number of sensors 2. Builds a model and uses it to analyze incoming data 3. Tells the actuators what action to take Over a network of machines

Page 16: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

16 © Copyright 2015 EMC Corporation. All rights reserved.

JOURNEY TO A DATA-DRIVEN ENTERPRISE

Deploy analytic apps and automate at scale

Perform advanced analytics Discover insights

Modernize data infrastructure

Page 17: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

17 © Copyright 2015 EMC Corporation. All rights reserved.

SMART SYSTEMS = SENSORS + DIGITAL BRAIN + ACTUATORS

Problem Formulation

Modeling Step

Data Step Application Step

Data Science for Building Models

Sensors & Actuators

Data Lake

Page 18: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

18 © Copyright 2015 EMC Corporation. All rights reserved.

World’s First Open Sourced,

Enterprise-Class Data Portfolio

+ Open Data Platform

PIVOTAL BIG DATA SUITE

OPEN AGILE CLOUD-READY

Modern Data Infrastructure

+ Advanced Analytics

+ Apps at Scale

Multiple Cloud Deployment Models

+ Big Data Suite on Pivotal

Cloud Foundry

Page 19: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

19 © Copyright 2015 EMC Corporation. All rights reserved.

THE INTERNET OF THINGS JOURNEY STORE • Structured

• Unstructured

• High Volume

• High Velocity

ANALYZE • Predictive Analytics

• Machine Learning

• Advanced Data Science

• Real-time Analytics

DEVELOP • Advanced Analytic Pipelines

• Real-time Analytical Applications

• Global Scale Data-Driven Applications

• Enterprise, Consumer, IoT, and Mobile

INNOVATE • Agile Dev Expertise

• DevOps

• Hybrid Cloud

• Continuous Delivery

• Closed Loop Applications

AGILE DEVELOPMENT

BIG DATA PREDICTIVE ANALYTICS

ENTERPRISE PAAS

Page 20: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

20 © Copyright 2015 EMC Corporation. All rights reserved.

Data Science for IoT

Page 21: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

21 © Copyright 2015 EMC Corporation. All rights reserved.

DATA SCIENCE?

App Development

Analytics

Business Intelligence Reporting

Visualization

Dashboards

Insights Big Data

Machine Learning Statistics

Mathematics Time Series

Algorithms

Databases

Software

Modeling

Queries

Real-Time

Sensors

Predictive Models

ETL

Research

Hadoop

Distributed Computing

MapReduce

SQL

In-Memory

OLAP

Text Mining

Unstructured Data

Open Source

Decision Science

Ad Hoc Queries

Hacking

In-Database Analytics

Internet of Things

Data Cleansing

Sentiment

Page 22: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

22 © Copyright 2015 EMC Corporation. All rights reserved.

WHAT IS DATA SCIENCE?

The use of statistical and machine learning techniques on big multi-structured data in a distributed computing environment to identify correlations and causal relationships, classify and predict events, identify patterns and anomalies, and infer probabilities, interest, and sentiment.

DRIVE AUTOMATED, LOW-LATENCY ACTIONS IN RESPONSE TO EVENTS OF INTEREST

Page 23: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

23 © Copyright 2015 EMC Corporation. All rights reserved.

Phase 1: Problem Formulation

Make sure you formulate a problem that is relevant to the goals and pain points of

the stakeholders

Phase 2: Data Step Build the right feature set

making full use of the volume, variety and

velocity of all available data

Phase 3: Modeling Step This is where you move from answering what, where and when to

answering why and what if?

Phase 4: Application Create a framework for

integrating the model with decision making processes and taking action using the

Internet of Things

THE EIGHTFOLD PATH OF DATA SCIENCE FOUR PHASES AND FOUR DIFFERENTIATING FACTORS

Technology Selection Select the right platform and the right set of tools for solving the problem at

hand

Iterative Approach Perform each phase in an agile manner, team up

with domain experts and SMEs, and iterate as

required

Creativity

Take the opportunity to innovate at every phase

Building a Narrative Create a fact-based narrative that clearly

communicates insights to stakeholders

http://blog.pivotal.io/data-science-pivotal/p-o-v/the-eightfold-path-of-data-science

Page 24: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

24 © Copyright 2015 EMC Corporation. All rights reserved.

Gene Sequencing

Smart Grids COST TO SEQUENCE ONE GENOME HAS FALLEN FROM

$100M IN 2001

TO $10K IN 2011 TO $1K IN 2014

READING SMART METERS EVERY 15 MINUTES IS 3000X MORE DATA INTENSIVE

Stock Market

Social Media

FACEBOOK UPLOADS 250 MILLION PHOTOS EACH DAY

BILLIONS OF DATA POINTS

Oil Exploration

Video Surveillance

OIL RIGS GENERATE

25000 DATA POINTS PER SECOND

Medical Imaging

Mobile Sensors

Page 25: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

25 © Copyright 2015 EMC Corporation. All rights reserved.

P L A T F O R M

DATA SCIENCE TOOLKIT

TOOLS LANGUAGES

SQL

Page 26: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

26 © Copyright 2015 EMC Corporation. All rights reserved.

ANALYTICS WITH PIVOTAL A SINGLE ADDRESS FOR EVERYTHING ANALYTICS

FORECASTING CLUSTERING

REGRESSION

CLASSIFICATION

OPTIMIZATION

Page 27: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

27 © Copyright 2015 EMC Corporation. All rights reserved.

IoT Trends & Implications

Page 28: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

28 © Copyright 2015 EMC Corporation. All rights reserved.

IOT UBIQUITY ACROSS VERTICALS

Healthcare & Lifestyle

Agriculture & Farming

Energy

Retail

Manufacturing & Heavy Industry

Transportation

Financial Services

Communications

Industrial Internet Smart Cities

Smart Homes Connected Cars

Connected Wearables …

Page 29: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

29 © Copyright 2015 EMC Corporation. All rights reserved.

COMMON USE CASES ACROSS VERTICALS

Healthcare & Lifestyle

Agriculture & Farming

Energy

Retail

Manufacturing & Heavy Industry

Transportation

Financial Services

Communications

Industrial Internet Smart Cities

Smart Homes Connected Cars

Connected Wearables …

Predictive Maintenance

Predict Quality of Product / System Security Analytics

Demand Modeling Anomaly Detection Recommendation

Systems

Page 30: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

30 © Copyright 2015 EMC Corporation. All rights reserved.

COMMON USE CASES ACROSS VERTICALS

Healthcare & Lifestyle

Agriculture & Farming

Energy

Retail

Manufacturing & Heavy Industry

Transportation

Financial Services

Communications

Industrial Internet Smart Cities

Smart Homes Connected Cars

Connected Wearables …

Demand Modeling Anomaly Detection

Recommendation Systems

Energy: Predict drilling equipment function and failure. Optimize drilling efficiency and

maintenance schedules

Manufacturing: Predict failure of disk drives in servers to help in planning replacement schedule

Healthcare: Network intrusion detection of covert threats and malware in medical devices

Energy: Detection of malware and threats in electricity distribution grid

Manufacturing: Predict vaccine potency based on manufacturing

sensor data

Transportation: Predict duration of road traffic incidents. Helps

improve quality of commute

Predictive Maintenance

Predict Quality of Product / System Security Analytics

Page 31: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

31 © Copyright 2015 EMC Corporation. All rights reserved.

COMMON USE CASES ACROSS VERTICALS

Healthcare & Lifestyle

Agriculture & Farming

Energy

Retail

Manufacturing & Heavy Industry

Transportation

Financial Services

Communications

Industrial Internet Smart Cities

Smart Homes Connected Cars

Connected Wearables …

Predictive Maintenance

Predict Quality of Product / System Security Analytics

Demand Modeling Anomaly Detection Recommendation

Systems

Page 32: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

32 © Copyright 2015 EMC Corporation. All rights reserved.

COMMON USE CASES ACROSS VERTICALS

Healthcare & Lifestyle

Agriculture & Farming

Energy

Retail

Manufacturing & Heavy Industry

Transportation

Financial Services

Communications

Industrial Internet Smart Cities

Smart Homes Connected Cars

Connected Wearables …

Predictive Maintenance

Predict Quality of Product / System Security Analytics

Retail: Recommend products / services at point of sale or while

browsing

Manufacturing: Recommend parts and service methods for

post-sale product maintenance

Retail: Enhanced and granular modeling of consumer demand using large historical data repositories and more accurate inventory / point-of-sale records

Energy: Smart meter data enables improved electricity demand modeling

Energy: Smart meter data enables detection of anomalous power usage patterns related to

theft, vegetation management issues, meter malfunction

Manufacturing: Identifying die failures or anomalies from wafer

bin map images

Demand Modeling Anomaly Detection Recommendation

Systems

Page 33: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

33 © Copyright 2015 EMC Corporation. All rights reserved.

CHALLENGES IN IOT USE CASES

Data Location

Data Integration

Data Cleansing Labels, Labels, Labels

Normal vs. Anomaly

False Alarms vs. No Alarms

Value Creation

Page 34: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

34 © Copyright 2015 EMC Corporation. All rights reserved.

– Data from ‘things’ and business processes on different platform(s) – Locating data centrally in Data Lake alleviates access issues

– Missing values, dirty data and the presence of sensor/device malfunctions require analytical techniques to resolve

– Integration of IoT data with business data and external data can be non-trivial – Tools from statistics and machine learning can play an important role here too. e.g.

Algorithms to recommend natural join mechanisms for different data sources

CHALLENGES IN IOT USE CASES

Data Location

Data Integration

Data Cleansing

Page 35: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

35 © Copyright 2015 EMC Corporation. All rights reserved.

– New types of data collected from IoT devices might be poorly understood – Unsupervised learning techniques can help uncover normal and anomalous patterns

– Labeled data for training models hard to find – Problems can be approached in two stages: Unsupervised techniques to learn ‘labels’

from the data and subsequent Supervised Learning efforts

– How tolerant a user is to false alarms or no alarm when there should be one needs to be taken into account in predictive models

– Ensemble models can improve the predictive power of models and reduce issues

CHALLENGES IN IOT USE CASES

Labels, Labels, Labels

Normal vs. Anomaly

False Alarms vs. No Alarms

Page 36: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

36 © Copyright 2015 EMC Corporation. All rights reserved.

IoT Use Cases

Page 37: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

37 © Copyright 2015 EMC Corporation. All rights reserved.

Data: The New Oil Producing Value for the Oil & Gas

Industry

Page 38: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

38 © Copyright 2015 EMC Corporation. All rights reserved.

• Oil and gas exploration and production activities generate large amounts of data from sensors, logistics, business operations and more

• Rise of cost-effective data collection, storage and computing devices is giving an established industry a new boost

• The promise of Data as “the new oil” is realized when we can tap into its value in a meaningful, cross-functional way to enhance decision-making, which provides the competitive advantage

DATA: THE NEW OIL

http://commons.wikimedia.org/wiki/File:Rig_wind_river.jpg

Page 39: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

39 © Copyright 2015 EMC Corporation. All rights reserved.

• Predictive Maintenance - Take steps towards zero unplanned downtime

• Business Problem: Predict drilling equipment function and failure

• Motivation: Drilling wells and equipment failure during the process are expensive

– Drilling motor damage could account for 35% of rig non-productive time (NPT) and can cost $150,000 per incident1

– Given the presence of over 800,000 oil & gas wells in the US (as of 20092) the total cost of such incidents could amount to billions of dollars

• Goals: – Provide early warning system – Provide insights into prominent features impacting operation and failure – Reduction of non-productive drill time (and costs) – Reduction of failure incidents

PREDICTING EQUIPMENT FUNCTION AND FAILURE

1 The American Oil & Gas Reporter, April 2014 Cover Story 2 data.gov

Page 40: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

40 © Copyright 2015 EMC Corporation. All rights reserved.

• Predicting drill rate-of-penetration (ROP) / drilling equipment failure

• Primary data sources: – Drill Rig Sensor Data: Depth, Rate of Penetration (ROP), RPM, Torque,

Weight on Bit, etc… ( >billions of records) – Operator Data: Drill Bit details, Failure details, Component details etc…

(>thousands of records)

PREDICTIVE ANALYTICS FOR DRILLING OPERATIONS

Data Integration

Feature Building Modeling

Page 41: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

41 © Copyright 2015 EMC Corporation. All rights reserved.

• Platform for all phases of the analytics cycle

• Support development of complex and extensible predictive models to predict equipment function and failure

• Provide framework for integrating data from multiple sources across data warehouses and rig operators

• Ability to analyze both structured and unstructured data in a unified manner. For instance: – Support fast computation of hundreds of features over time windows within 100s of millions (or

billions / trillions) of records of time-series data – Natural language processing pipeline for analysis of operator comments to identify failures from

unstructured text

TECHNOLOGY SELECTION

PL/Python PL/R

Page 42: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

42 © Copyright 2015 EMC Corporation. All rights reserved.

Drill Rig Sensor

data

• Need a comprehensive framework for data integration at scale

• Data cleansing – Removing NULLs and outliers – Missing value imputation – Manually entered data (operator data) is prone to errors – Invalid values for sensor measurements

• Standardizing columns – data sources do not use consistent entries in features / columns that link them e.g. well names across different data sources

COMPREHENSIVE DATA INTEGRATION FRAMEWORK

Integrated

Operator data

Page 43: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

43 © Copyright 2015 EMC Corporation. All rights reserved.

COMPLEX FEATURE SET ACROSS MULTIPLE DATA SOURCES

• Depth • Rate of

Penetration • Torque • Weight on Bit • RPM • …

• Drill Bit details • Component

details etc. • Failure events • …

Features on Time Windows

• Mean • Median • Standard Deviation • Range • Skewness • …

Final Set of Features on Time Windows

Leverage GPDB / HAWQ (+ MADlib, PL/R and PL/Python as needed) for fast computation of hundreds of features over time windows within millions or

billions of rows of time-series data

Operator data

Drill Rig Sensor data

• Pivotal GPDB has built in support for dealing with time series data • SQL window functions: e.g. lead, lag, custom windows • More details in Pivotal’s Time Series Analysis blogs:

http://blog.pivotal.io/tag/time-series-analysis

Page 44: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

44 © Copyright 2015 EMC Corporation. All rights reserved.

• Predict Rate-of-Penetration – Linear Regression – Elastic Net Regularized Regression

(Gaussian) – Support Vector Machines

DRILLING OPERATIONS EXAMPLES OF PREDICTIVE MODELS

• Predict occurrence of equipment failure in a chosen future time window

– Logistic Regression – Elastic Net Regularized Regression

(Binomial) – Support Vector Machines

• Predict remaining life of equipment – Cox Proportional Hazards Regression

Page 45: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

45 © Copyright 2015 EMC Corporation. All rights reserved.

BIG DATA MACHINE LEARNING IN SQL http://madlib.net/

Predictive Modeling Library

Linear Systems • Sparse and Dense Solvers

Matrix Factorization • Single Value Decomposition (SVD) • Low-Rank

Generalized Linear Models • Linear Regression • Logistic Regression • Multinomial Logistic Regression • Cox Proportional Hazards • Regression • Elastic Net Regularization • Sandwich Estimators (Huber white,

clustered, marginal effects)

Machine Learning Algorithms • Principal Component Analysis (PCA) • Association Rules (Affinity Analysis,

Market Basket) • Topic Modeling (Parallel LDA) • Decision Trees • Ensemble Learners (Random Forests) • Support Vector Machines • Conditional Random Field (CRF) • Clustering (K-means) • Cross Validation

Descriptive Statistics

Sketch-based Estimators • CountMin (Cormode-

Muthukrishnan) • FM (Flajolet-Martin) • MFV (Most Frequent

Values) Correlation Summary

Support Modules

Array Operations Sparse Vectors Random Sampling Probability Functions PMML Export

Page 46: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

46 © Copyright 2015 EMC Corporation. All rights reserved.

• Every incident identified prior to failure potentially saves hundreds of thousands of dollars1

• Ability to fully utilize big data – volume, variety and velocity

• Comprehensive data integration framework for multiple complex data sources

• Learn and implement best practices for: – Data capture techniques, flow, and curation – Platform and toolset for data fabric

• Build and operationalize complex and extensible predictive models

• Improve efficiency, reduce costs and risks

ONE STEP CLOSER TO ZERO UNPLANNED DOWNTIME …

1 The American Oil & Gas Reporter, April 2014 Cover Story

Page 47: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

47 © Copyright 2015 EMC Corporation. All rights reserved.

Automatic Clustering of IT Infrastructure Alerts

Page 48: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

48 © Copyright 2015 EMC Corporation. All rights reserved.

• Enterprise network is complex – Different technology components with lots of dependency

• Role of Reliability Engineering – Ensure 24x7 uptime, network monitoring and quick resolution

• Alerts are high volume; eyes-on-glass operation – Event logs, performance metric log, incident tickets

• What intelligence can we mine from data to improve operational efficiency for Reliability Engineering?

MOTIVATION

Page 49: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

49 © Copyright 2015 EMC Corporation. All rights reserved.

DATA SOURCES Business Service

Examples: Debit Processing Fraud Detection

Network Storage Servers Middleware

IT infrastructure

Alert Management Tool

Rules based Engine

Operations Support

personnel

Incident tickets

Incident Resolution

Manually created

incidents

• Alerts (semi-structured) • Incidents (unstructured) • Incident Work Info

Page 50: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

50 © Copyright 2015 EMC Corporation. All rights reserved.

• What are the typical failure patterns? – Top 20 incident types that operations support is busy with

• Given an incident and historical resolution data can we recommend a set of resolutions?

– Incident A is similar to incident B in the past => may be resolution for B applies to A

• Can we predict failures before they occur? – Collect component-wise logs to predict failures

OPPORTUNITIES FOR DATA-SCIENCE

Page 51: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

51 © Copyright 2015 EMC Corporation. All rights reserved.

Data – Large volume: 10 million alerts

and incidents in 6 months from just one business service. There are numerous services – debit processing, fraud detection, etc..

– Multi-structured: Semi-structured and unstructured text

– No labeled data

CHALLENGES

Analytics – Alerts / incidents have short text – Clustering techniques at scale

for alerts / incidents – Cluster interpretability for

qualitative evaluation

Page 52: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

52 © Copyright 2015 EMC Corporation. All rights reserved.

FRAMEWORK TO CLUSTER ALERTS WHAT ARE THE TYPICAL FAILURE PATTERNS?

Token Normaliz-

ation

Stopword Removal

Compute Distance Metric Clustering

Method

Visualize Results

Alerts text include specific dates, times, IP-addresses – such details not important for clustering

Common words across alerts within are not important for clustering

Compute string distance metrics (such as Jaccard distance) to compare alerts

Build a graph with alerts as nodes and “high” similarity between the nodes as edges. Detect Connected Components from the graph to identify clusters

Page 53: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

53 © Copyright 2015 EMC Corporation. All rights reserved.

Alerts: 15,722INCs: 462

Alerts:1,556

Alerts:1,795

Alerts:1,537

Alerts:1,772

Alerts:1,420

Alerts: 5,630INCs: 93

Alerts: 2,778INCs: 65

Alerts: 13,899INCs: 264

CLUSTER VISUALIZATION

Page 54: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

54 © Copyright 2015 EMC Corporation. All rights reserved.

MEAN-TIME-TO-RESOLVE FOR CLUSTERS

Health Service heartbeat failure

Process X is not running, please restartWeb session emulator process hung issueHyperion Foundation Services is not runningWebsite / URL unavailableSymantec critical system protection service is not runningServer booted, ensure critical apps are runningWeb Service / Probe URL unavailableCPU Utilization IssuePlayspan IssueAgent is not runningHard disk free space issue

427

453

494

737

983

1,060

1,093

1,352

2,280

2,837

2,911

3,950

749

628

330

370

151

17

24

12

11

9

Windows Open Systems

Space utilization issue

AIX Hardware errorProcess X is not running, please restartProcess X may have missed an execution intervalCPU Utilization IssueServer booted, ensure critical apps are runningSplunk Agent unavailableConnection failed issueLDAP connectivity issueNet Backup: History file process failure 4,248

1,206

1,902

805

685

820

760

444

434

591

1,407

835

227

270

55

25

17

Unix Open Systems Alert Counts

12,065

1,535

979

2,774

769

1,208

512

1,286

825

691

1,486

MTTR*

1,962,569

46,050

7,832

11,096

66,134

105,096

12,288

489,966

105,600

154,093

964,414

Total MTTR*

9,989

1,030

1,061

60

1,978

1,098

542

494

557

571

1,845

519

234

730,207

158,620

11,671

8,220

94,944

13,176

115,988

2,470

75,752

3,426

16,605

181,131

48,204

Page 55: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

55 © Copyright 2015 EMC Corporation. All rights reserved.

CLUSTER VISUALIZATION – AN ALTERNATIVE TO WORD CLOUDS

Page 56: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

56 © Copyright 2015 EMC Corporation. All rights reserved.

• Perform clustering on a 3-6 month window of data

• Assign incoming alerts to one of the existing clusters according to a distance criteria

– May find emerging patterns

• Monitor cluster statistics (Mean Time To Resolve, number of incidents etc.) on a dashboard

• Perform clustering again every few months on new data

OPERATIONALIZATION

Page 57: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

57 © Copyright 2015 EMC Corporation. All rights reserved.

TECHNOLOGY Platform

SQL PL/Python PL/R

Data Science

Visualization

Page 58: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

58 © Copyright 2015 EMC Corporation. All rights reserved.

• Understand where your alerts/incidents are coming from is an important step in improving infrastructure support

– Profiling classes of alerts for business intelligence – Resolution recommendation – Application or hardware failure prediction – Reduction in time to resolve alerts and incidents (reduce costs)

• Distributed computing architecture + proper algorithm choice are needed to deal with scalability

• Tuning clustering results require good cluster visualization techniques

TAKEAWAYS AND LESSONS LEARNED

Page 59: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

59 © Copyright 2015 EMC Corporation. All rights reserved.

• Pivotal Data Product Info, Docs and Downloads @ http://pivotal.io/big-data

• Pivotal Blog @ http://blog.pivotal.io

• Pivotal Data Science Blog @ http://blog.pivotal.io/data-science-pivotal

• Oil & Gas Use Case Webinar: – Video: https://www.youtube.com/watch?v=dhT-tjHCr9E – Slides: http://www.slideshare.net/Pivotal/data-as-thenewoil

• IT Operations Use Case Webinar: – Video: https://www.youtube.com/watch?v=2goBoBp1klg – KDD 2014 paper: “Unveiling Clusters of Events for Alert and Incident Management in

Large-Scale Enterprise IT”, http://dl.acm.org/citation.cfm?id=2623360

FOR FURTHER INFO, CHECKOUT…

Page 60: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science
Page 61: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

61 © Copyright 2015 EMC Corporation. All rights reserved.

APPENDIX

Page 62: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

62 © Copyright 2015 EMC Corporation. All rights reserved. 62 © Copyright 2013 Pivotal. All rights reserved.

Smart Meter Analytics

Page 63: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

63 © Copyright 2015 EMC Corporation. All rights reserved.

The Digital Brain: Making a Smart Grid Smarter!

Action: Where (and when) to send trucks, preventive maintenance

The Digital Brain: Uses Fourier transform extracts patterns and flags outliers/anomalies

Input: Data from smart meters

Page 64: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

64 © Copyright 2015 EMC Corporation. All rights reserved.

SMART METER ANALYTICS – SIGNIFICANT USE CASES • Load profiling • Theft prevention • Demand prediction • Load forecasting • Root cause of power failures • Black-out warning • Anomaly detection • Network topology error

detection

Page 65: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

65 © Copyright 2015 EMC Corporation. All rights reserved.

SOLUTION • Analyze smart meter power data using

unsupervised clustering techniques and detect anomalies based on distance metric in clusters

• Reduce time required to monitor and improve grid efficiencies

• Leveraged the MPP architecture of Pivotal GPDB and MADlib in-database machine learning library for fast computation at scale

ELECTRICITY NETWORK LOAD PROFILING AND OUTLIER DETECTION CUSTOMER A major smart grid infrastructure provider BUSINESS PROBLEM Profile power consumption patterns based on smart meter data and flag anomalous usage CHALLENGES • Large volume of smart meter data (several

months of data from 100s of thousands of meters) could not be analyzed effectively by legacy system

• Timely business insights on large scale smart grid infrastructure demand fast processing of data

Page 66: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

66 © Copyright 2015 EMC Corporation. All rights reserved.

ELECTRICITY NETWORK LOAD PROFILING AND OUTLIER DETECTION

Dashboards for navigating clusters and outliers

Page 67: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

67 © Copyright 2015 EMC Corporation. All rights reserved.

Network Topology Error Detection CUSTOMER

A major utility

BUSINESS PROBLEM

Use load and voltage meter readings to determine errors in transformer network topologies

CHALLENGES

• Time consuming process to detect network topology errors on entire network in legacy system

• Timely detection of network topology errors requires big data infrastructure and analytical capabilities

SOLUTION

• For each transformer network in parallel, solve an LP to determine scale of topology error, which can be used to flag and rank anomalous network topologies

• Reduce time for topology error detection from several days/weeks to few minutes!

Page 68: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

68 © Copyright 2015 EMC Corporation. All rights reserved. 68 © Copyright 2013 Pivotal. All rights reserved.

Security and Fraud

Page 69: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

69 © Copyright 2015 EMC Corporation. All rights reserved.

Attacker elevates access to important user, service and admin accounts, and specific systems

Data is acquired from target servers and staged for exfiltration

Data is exfiltrated via encrypted files over ftp to external, compromised machine at a hosting provider

A handful of users are targeted by two phishing attacks: one user opens Zero day payload (CVE-02011-0609)

The user machine is accessed remotely by Poison Ivy tool

Advanced Persistent Threat (APT) APT Kill Chain

1 4 3 2

Phishing & Zero Day Attack Back Door Lateral

Movement Data Gathering Exfiltrate

5

Page 70: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

70 © Copyright 2015 EMC Corporation. All rights reserved.

ANOMALOUS USER-TO-RESOURCE ACCESS DETECTION BUSINESS PROBLEM Detect anomalous user behaviors in the global enterprise computer network SUMMARY Given local-to-local communication data, identify anomalous users within an enterprise. • Reduce malware-dwell time, typical

243 days • Signature-based approaches cannot

detect such behavior CHALLENGES 10 Billion events in 6 months; 15K+ network devices; No existing SIEM solutions can model user behavioral resource access baseline and enable anomaly detection in an adaptive and scalable architecture.

SOLUTION An innovative Graph Mining based algorithmic framework with advanced Machine Learning. Network topology and temporal behaviors are both modeled. (Patent pending). Implemented in MPP and PL/R, enabling parallel model training and behavior risk scoring. Successfully identified DLP violating anomalous users.

Page 71: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

71 © Copyright 2015 EMC Corporation. All rights reserved. 71 © Copyright 2013 Pivotal. All rights reserved.

Financial Services

Page 72: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

72 © Copyright 2015 EMC Corporation. All rights reserved.

IDENTIFYING AND PRICING CROSS-SELL OPPORTUNITIES CUSTOMER A global financial services provider BUSINESS PROBLEM Identify cross-sell opportunities between two business arms of a financial institution. CHALLENGES Integration of large-scale data originating from multiple data warehouses. Developing predictive models to identify novel cross-sell opportunities within the financial institution. Evaluate the identified cross-sell opportunities by their revenue potential.

SOLUTIONS • Fast integration of data in Pivotal

Greenplum Database. • Predictive models and evaluation of

profitability: – Association rule. – Logistic regression for each

product offered. – Estimation of revenue opportunity.

• On-demand reporting and visualization via custom dashboards connected to in-database models.

Page 73: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

73 © Copyright 2015 EMC Corporation. All rights reserved.

CREDIT RISK ASSESSMENT AND STRESS TESTING CUSTOMER A global financial services provider BUSINESS PROBLEM Speed up the process of compliance reporting and stress testing for Basel III. CHALLENGES Running the calculation procedures on the customer’s legacy database were time-consuming, therefore had to be done in overnight batch mode.

SOLUTION • Implement risk asset calculation and

stress testing on Pivotal Greenplum Database.

• Three years of data was processed in well under 2 minutes, significantly faster than the customer’s current procedures.

• Connect an “in- database” visualization tool to Pivotal Greenplum Database via ODBC for on-demand reporting and visualization.

Page 74: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

74 © Copyright 2015 EMC Corporation. All rights reserved.

FINANCIAL COMPLIANCE BUSINESS PROBLEM • Ensure compliance with Dodd-Frank and Basel

Committee regulations • Identify underlying risk and fraud while reducing

the compliance department’s overburdened

Emails Chats Trades

Transactions Policy Securities

Phone Calls Watch Lists …

Financial compliance Data Lake

Data integration

Data clean up Modeling Classification

and ranking

Analyst user interfaces Feedback

Analytics

Analyst feedback Data integration: e.g., append trade information with email and chat communications

Data cleanup: e.g., identify newsletters and spam emails

Modeling: • Predictive modeling to flag

messages and trades • Graph and cohort analysis

Analyst feedback Reviewed fraud instances included in periodic model refreshes

SOLUTION A data lake platform coupled with cutting edge data

science techniques Flexible user interface to promote an adaptive,

continuously learning compliance framework

Page 75: Internet of Things (IoT): Trends & Implications For Enterprise & Data Science

75 © Copyright 2015 EMC Corporation. All rights reserved.

PIVOTAL TOPIC & SENTIMENT ANALYSIS ENGINE

External Tables

PXF

HDFS

Source: http Sink: hdfs

Parallel Parsing of JSON

(PL/Python)

HAWQ

Nightly Cron Jobs

Topic Analysis through MADlib pLDA

Unsupervised Sentiment Analysis

(PL/Python)

D3.js Spring XD

Twitter Decahose (~55 million tweets/day)


Recommended