+ All Categories
Home > Technology > Challenges in Analytics for BIG Data

Challenges in Analytics for BIG Data

Date post: 23-Jan-2017
Category:
Upload: prasant-misra
View: 145 times
Download: 1 times
Share this document with a friend
29
Challenges in Analytics for BIG Data Dr. Prasant Misra W: https://sites.google.com/site/prasantmisra Disclaimer: The opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of the organization that he works for.
Transcript
Page 1: Challenges in Analytics for BIG Data

Challenges in Analytics for BIG Data

Dr. Prasant Misra W: https://sites.google.com/site/prasantmisra

Disclaimer:

The opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of the organization that he works for.

Page 2: Challenges in Analytics for BIG Data

A simple narrative to BIG DATA

8/26/2016 2

DATA whose characteristics exceeds the capabilities of conventional algorithms, systems and techniques to derive useful value is considered BIG

datascience.berkeley.edu

The term if very fuzzy and means different things to different groups of people ….

Page 3: Challenges in Analytics for BIG Data

How did we arrive at this stage ?

8/26/2016 3

Page 4: Challenges in Analytics for BIG Data

1960 - 70

1980 - 90

2000 -10 and beyond

Year

Size

8/26/2016 4

History of Computing

Accessibility to cyber end points have increased drastically …

Page 6: Challenges in Analytics for BIG Data

8/26/2016 6

DATA Proliferation

Web & Social Media

Enter-prises

Gov.

Page 8: Challenges in Analytics for BIG Data

8/26/2016 8

Local Search - I

Context Service Example

Current Location

Local business

Page 9: Challenges in Analytics for BIG Data

8/26/2016 9

Local Search - II

Context Service Example

Current Location

Local business and directions

+ Time Tracks

Businesses in driving direction

Page 10: Challenges in Analytics for BIG Data

8/26/2016 10

Local Search - III

Context Service Example

Current Location

Local business and directions

+ Time Tracks

Businesses in driving direction

+ History

Personalized directions

Take 520 East

Page 11: Challenges in Analytics for BIG Data

8/26/2016 11

Local Search - IV

Context Service Example

Current Location

Local business and directions

+ Time Tracks

Businesses in driving direction

+ History

Personalized directions

+ Community

Tourist recommendation

35% people pick the scenic route

Page 12: Challenges in Analytics for BIG Data

8/26/2016 12

Local Search - V

Alert: Bad Traffic

Consider Alternate

route

Context Service Example

Current Location

Local business and directions

Tracks Businesses in driving direction

+ History

Personalized directions

+ Community

Tourist recommendation

+ Push

alerts, triggers, reminders

BIG Data for Location Analytics …

Page 13: Challenges in Analytics for BIG Data

8/26/2016 13

Analytics: Span across Verticals & Horizontals

Depending on the type and quality of analytics, system could manifest themselves into:

User-centric Systems — Systems That Know/Aware

Adaptive Systems — Systems That Learn

Cognitive Systems — Systems That Reason

E

N

E

R

G

Y

W

A

T

E

R

R

E

T

A

I

L

T

E

L

C

O

M

H

E

A

L

T

H

Time, Location Management

Sensor, Device Management

Network Management

Cloud Infra Management

Customer Management

Page 14: Challenges in Analytics for BIG Data

The flavor of Data that is BIG

8/26/2016 14

Page 15: Challenges in Analytics for BIG Data

The 4 Dimensions of BIG Data

8/26/2016 15

Page 16: Challenges in Analytics for BIG Data

Analytics

8/26/2016 16

Page 17: Challenges in Analytics for BIG Data

Value 8/26/2016 17

Hindsight and Insight/ Insights into the PAST

Foresight/ Insights into the FUTURE

Skill

Descriptive

“WHAT has happened ? ”

Diagnostic

“WHY did this happen ?”

Prescriptive

“WHAT should we do ?”

Predictive

“WHAT could happen ? ”

Information Optimization

Analytics : Category

DASHBOARD

FORECAST ACTIONS, RULES,

RECOMMs

Page 18: Challenges in Analytics for BIG Data

Example: Energy Analytics for a PV Microgrid

8/26/2016 18

Descriptive: What is the total energy, instantaneous energy and power, etc., …?

Diagnostic: Why is the panel temperature decreasing when the solar irradiance is high and the wind speed is very low ?

Predictive: Can I forecast the plant output for tomorrow, or can I generate 4kWh net energy ?

Predictive : What actions should be undertaken for the plant to reach 4kW energy generation capacity from its current 2 kW ?

Page 19: Challenges in Analytics for BIG Data

8/26/2016 19

Analytics : Methodology

Reason and Plan with Uncertain Knowledge

Quantify uncertainty & Probabilistic reasoning: Bayesian networks, Conditional distributions

Probabilistic reasoning over time:

Hidden Markov models, Kalman filters, Dynamic Bayesian networks

Simple decisions: Utility theory, Decision networks

Complex decisions: Partial observable Markov Decision Process (MDP), Game theoretic models

Planning graphs

Learning and Data Mining:

[Supervised | Semi-supervised | Unsupervised | Reinforcement] learning – Classification, Clustering

Different type of ANN | Deep Learning Networks | Support Vector Machines

Page 20: Challenges in Analytics for BIG Data

Challenges

8/26/2016 20

Page 21: Challenges in Analytics for BIG Data

Data to Knowledge Pipeline

8/26/2016 21

Cyber & Physical Space Entities

Edge

Global Infra

Data Ingestion

Data Analysis

Applications

Data source

“Big” data Infra

“Little” data Infra

Decision making with Knowledge

DATA @ REST (VOLUME) Archival/Static data (TBs) in Data stores

DATA @ MOTION (VELOCITY) Streaming data

DATA @ MANY FORMS (VARIETY) Structured/Unstructured, Text, Multimedia, Audio, Video

DATA @ DOUBT (VERACITY) Data with uncertainty that may be due to incompleteness, missing points, etc.,

NATURE of INGESTED DATA

COGNITIVE Learn dynamically ?

PRESCRIPTIVE What are the best outcomes ?

PREDICTIVE What could happen ?

DESCRIPTIVE What has happened ?

DISCOVERY What do we have ?

NATURE of ANALYSIS

Page 22: Challenges in Analytics for BIG Data

A first list of challenges derived from the V’s

8/26/2016 22

Volume: How much data is really relevant to the problem solution & what is the cost of processing ?

Can you really afford to store and process all that data ?

Velocity A lot of data is coming in at high speed

Need for streaming versus block approach to data analysis

How to analyze data in-flight and combine with data at-rest

Variety:

A small fraction is in structured formats (e.g., relational, XML, etc.)

A fair amount is semi-structured (e.g., web logs, etc.)

The rest of the data is unstructured (e.g., text, photographs, etc.)

No single data model can currently handle the diversity

Veracity: Cover term for: Accuracy, Precision, Reliability, Integrity

What is it that you don’t know about the data ?

Page 23: Challenges in Analytics for BIG Data

Top Challenges

8/26/2016 23

Data acquisition

Is raw data of interest in totality ?

Challenge:

design efficient filters and compression techniques in a manner that does not discard useful

information; automatically generate the right meta data to describe it

Data reduction

Will traditional data reduction approaches (via compression) become overwhelming ?

Challenge: introduction of new data collection practices and models as per analytical needs;

compact (space, time) representations/dictionary/basis; parsimonious model (low-

dimensionality, compressed sensing and sparse data capture models)

“Big-Little” Data

Device cloud vs. Conventional cloud; Distributed data and Peer-to-Peer Federation

Challenge: how to combine Big and Little data for meaningful analytics (often in real time)

Analytics from the Edge to the Cloud

Will the current model of pushing all data to a central cloud for analytics scale, be inefficient, and

alleviate privacy concerns ?

Challenge: how to automate distributed analytics and decision making on subsets of “Little” and

“Big” data; within the constraints of device capability, privacy needs, energy and network costs,

and application QoS

Page 24: Challenges in Analytics for BIG Data

Top Challenges

8/26/2016 24

Handling inconsistent/incomplete/missing data and outliers

Is this critical ?

Challenge: design robust imputation algorithms

Heterogeneous Data Fusion

Is there a need to analyze the relationship between heterogeneous data objects/streams

Challenge: Extract right amount of semantics, sequential data fusion via transform spaces

Scalability with multi-level hierarchy

Will traditional methods of data navigational and search in deep hierarchy be scalable ?

Challenge: design newer alternatives

Data summarization for interactive Query

Will examination of datasets (all at once) become difficult ?

Data summarization let users request data with particular characteristics

Data summarization: organize data based on the presence/type of feature

Scientific data features: geometrical, topological, statistical

Non-scientific data features: related to semantic/syntactic components of the data

Challenge:

extraction of meaningful features, both from high and low dimension data

data storage and indexing in an I/O efficient format for rapid runtime retrieval

Page 25: Challenges in Analytics for BIG Data

Top Challenges

8/26/2016 25

Analytics of temporally/spatially evolving features

Do data features occur at different spatial and temporal scales ?

Challenge: effective visual techniques that are computationally practical and that can take advantage of humans unique cognitive ability to track those feature changes

Representation of evidence and uncertainty

Interpretation of evidence is subject to person performing this task, and depends on his prior knowledge, subjective settings and viewpoint

Uncertainty quantification models the consequence based on the presented evidence and then predicts the qualities of the corresponding outcome

Challenge: how to represent evidence and uncertainty clearly and without bias through visualization

Sense making to users/decision makers

Involves examining all the assumptions made and retracing the analysis

There can be many sources of error: computer systems can have bugs, models almost always have assumptions, and results can be based on erroneous data. For all of these reasons, users will try to understand, and verify, the results produced by the computer.

Challenge: what should the man-machine interface for this look like ?

Page 26: Challenges in Analytics for BIG Data

Platforms and Tools

8/26/2016 26

Page 27: Challenges in Analytics for BIG Data

8/26/2016 27

Page 28: Challenges in Analytics for BIG Data

8/26/2016 28

Scale and Size DOES matter !!!

Page 29: Challenges in Analytics for BIG Data

8/26/2016 29

References

Stephen H. Kaisler et. Al ,“Big data and analytics: challenges and issues”

Pak Chung Wong, Han-Wei Shen, Chaomei Chen, “Top Ten Interaction Challenges in Extreme-Scale Visual Analytics”

http://link.springer.com/chapter/10.1007/978-1-4471-2804-5_12#page-1

Other info graphics from the web !!!


Recommended