+ All Categories
Home > Documents > Big Data for Gov 2012 Grady

Big Data for Gov 2012 Grady

Date post: 18-Apr-2015
Category:
Upload: gradybiz
View: 26 times
Download: 1 times
Share this document with a friend
29
ENERGY & ENVIRONMENT NATIONAL SECURITY HEALTH CYBERSECURITY © SAIC. All rights reserved. Big Data and Analytics Capabilities and Challenges Nancy Grady, SAIC Technical Fellow, Data Science September 18-19, 2012
Transcript
Page 1: Big Data for Gov 2012 Grady

ENERGY & ENVIRONMENT • NATIONAL SECURITY • HEALTH • CYBERSECURITY

© SAIC. All rights reserved.

Big Data and Analytics Capabilities and Challenges Nancy Grady, SAIC Technical Fellow, Data Science September 18-19, 2012

Page 2: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

How to Approach Big Data Opportunities

•  Computing Context •  Data Life Cycle •  Big Data Engineering •  Data Science

2

Page 3: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

The Fifth Wave of Computing … meets …

•  Servers and dumb terminals (1950s-1960s)

•  Personal computers (1970s) •  Internet (1990s) •  Cloud (2000s)

–  Infrastructure, applications, data, and analytics

•  Mobile devices (2010s) – Carry them everywhere – Connectivity –  Sensors –  Location –  Books – Oh, …and phone service

3

Page 4: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

The Fourth Paradigm

1.  Experiments 2.  Theory 3.  Simulation

–  Big Iron 4.  Data Analytics

–  Big Data –  Sensors

In The Fourth Paradigm: Data-Intensive Scientific Discovery, pioneering computer scientist Jim Gray refers to discovery based on data-intensive science: •  Business Intelligence

–  tell me what’s happening •  Data Analytics

–  tell me what’s working •  Data Mining

–  tell me what’s unusual •  Modeling and Prediction

–  tell me what’s about to happen 4

Page 5: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Data Scaling

We’re feeling the disruption of powers-of-ten scaling

•  Compute power – growing according to Moore’s Law –  Data volumes are growing faster

•  Storage capacity – growing according to Moore’s Law –  Seek times are growing more

slowly

5

Powers of Ten

Page 6: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Concepts

•  Data Science –  Data used as evidence through hypothesis and experiment –  Opportunistic experiment versus designed experiment

•  Data Engineering –  Design and construction of software systems

•  Data Product –  Actionable result driven by data

•  Big Data –  Volume, Velocity, Variety –  Complexity, Latency, Cleanliness, Completeness, Provenance

6

Page 7: Big Data for Gov 2012 Grady

Data Life Cycle

Page 8: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Data Analysis Life Cycle (Data View)

8

Collect

Analyze

Need

Curate Act &

Monitor

Data

Information Knowledge

Benefit

Goal

Evaluate

Page 9: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

MISSION

COLLECT CURATE

ANALYZE

VISUALIZE MONITOR

ACT

SAIC Data Science

Life Cycle

Data Analysis Life Cycle (Practitioner View)

9

Page 10: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Data Analysis Life Cycle (Practitioner View)

10

HYPOTHESIS

STORE

1

2

MISSION

COLLECT CURATE

ANALYZE

VISUALIZE MONITOR

ACT

SAIC Data Science

Life Cycle

Page 11: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Mission Drives Everything

DESCRIPTION

CONFIRMATION

PREDICTION

PRESCRIPTION

INSIGHT FORESIGHT HINDSIGHT

What has just happened?

What is happening now?

Why is this happening?

What am I missing?

What will happen next?

What will happen if I take this action?

What’s the best that can happen?

How do I make it happen?

DISCOVERY

OPTIMIZATION

Valu

e

Complexity

Reporting

Alerting

Correlation

Mining

Forecasting

Modeling

Maximization

Recommendation

11

Page 12: Big Data for Gov 2012 Grady

Big Data Engineering

Page 13: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Big Data Characteristics

Engineering •  Volume •  Velocity •  Variety

Science •  Complexity •  Latency •  Cleanliness •  Completeness •  Provenance

13

Page 14: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Traditional Data Life Cycle

Domain

Cleanse Transform

ETL Action

Warehouse

Summarized Data

Algorithm

Analytic Mart

CAPTURE CURATE ANALYZE DEPLOY

Staging

14

Page 15: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Big Volume Engineering

Raw Data Cluster

Cleanse Transform Analyze

Shard

Data Product

Map

/Red

uce

Mart

Summarized Data

CAPTURE CURATE ANALYZE DEPLOY

Volume

Complexity Domain

15

Page 16: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Big Data Analytics

Public Cloud Infrastructure Private Cloud Infrastructure

Model Index

Scale2Insight

Models

Model Building …ƒ(n)…

NoSQL Store

Big

Dat

a An

alyt

ics

Mining

Spatial

Graph

Pattern

Learning

Software API

Search Tools Batch Stream

Classification

Prediction

Lucene

Anomaly Detection

Patterns Relationships

Identity

Tactical Views Alerts

Anomalies

Scalable

Model Analytics

A ⇒ B

model cache

16

Page 17: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Big Velocity Engineering

CAPTURE CURATE ANALYZE DEPLOY

Enriched Data Cluster

Velocity Volume

Complexity

Cleanse Transform

Shard

Alerting

Domain

17

Page 18: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Big Data Ingestion – Real Time Information Gateway

Source 1

Source 2

Source 3

Source X

Data Sources

Alerting Tools

Query Tools

Analysis Tools

Enrichment Sources

Alerting

Query Analysis

Data Storage

Fusion and Enrichment

…1001010101110101010110

+ Real Time Analysis

Public or Private Cloud Infrastructure

18

Page 19: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Big Data Adds to Existing Infrastructure

Enrichment Sources

Custom Enrichment

Enriched Data Storage

Ingestion, Fusion, Enrichment,

Alerting

Query, Modeling, Characterization,

Prediction

Custom Analytics

Models Analyst Tools

TO INFORMATION

Browsing Modules

TO KNOWLEDGE

TO INSIGHT

RTIG Big Data Ingestion

S2i Big Data Analytics

NoSQL Big Data Storage

FROM DATA

TO EXPLORATION

Query Analysis

Current Environment

Data

External Data

19

Page 20: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

SAIC Big Data Platform

Public Cloud Infrastructure

Scalable, Multi-Tenant

Component Repository

xml

csv

others

custom

Sources Parse

(name/val pairs)

…010011…

Translate (to data model)

…010011…

+

Process (add

enrichment) …010011…

Private Cloud Infrastructure

Model Index

RTIG

Scale2Insight

Models

Model Building …ƒ(n)…

NoSQL Store

Big

Dat

a In

gest

ion

Big

Dat

a An

alyt

ics

Index

Alert Consumers

Search Tools e.g., Solr

Alert Engine

Lucene

Custom

Custom

Mining

Spatial

Graph

Pattern

Learning

Software API

Search Tools Batch Stream

Others

Classification

Prediction

Lucene

Anomaly Detection

Patterns Relationships

Identity

Tactical Views Alerts

Anomalies

Scalable

Model Analytics

A ⇒ B

model cache

20

Page 21: Big Data for Gov 2012 Grady

Data Science

Page 22: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Data Analysis Life Cycle (Practitioner View)

22

HYPOTHESIS

STORE

1

2

MISSION

COLLECT CURATE

ANALYZE

VISUALIZE MONITOR

ACT

SAIC Data Science

Life Cycle

Page 23: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Big Data Analysis Life Cycle

HYPOTHESIZE

STORE

23

1

3

EXPLORE 2

MISSION

COLLECT CURATE

ANALYZE

VISUALIZE MONITOR

ACT

SAIC Data Science

Life Cycle

Page 24: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Data Science

DATA SCIENCE

STATISTICS DATA MINING

DOMAIN EXPERTISE

PROGRAMMING SKILLS

RESEARCH

ANALYTIC SYSTEMS ALGORITHMS

24

Page 25: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Data Science Emphasis

TRADITIONAL EMPHASIS

ALGORITHMS

SYSTEM

DATA

DATA ENGINEERING

SYSTEM

DATA

ALGORITHMS

DATA SCIENCE

DATA

ALGORITHMS

SYSTEM

25

Page 26: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Agile Analytics

26

DATA

SUMMARIZE

EXPLORE

DIS

PLAY

HYPOTHESIZE

FEAT

UR

ES

Page 27: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Summary

•  The “Science” is what needs to be done •  The “Engineering” is how to do it

•  Big Data is often re-purposed from data collected for other reasons

•  Engineering enables Big Data Analysis to move into a rapid Science hypothesis-testing cycle for greater value

27

Page 28: Big Data for Gov 2012 Grady

SAIC.com © SAIC. All rights reserved.

Recommendations

•  Focus on building your team to cover all the Data Science skills

•  Determine your dominant engineering characteristics to design your approach –  Volume, Velocity, Variety

•  Be aware of the other characteristics of the data

•  Plan to spend 80% of the time on the Curation

•  Work to add in a Big Data capability to your existing infrastructure

28

Page 29: Big Data for Gov 2012 Grady

Contact Information

Nancy Grady, Ph.D. SAIC Technical Fellow, Data Science Homeland and Civilian Solutions

[email protected] 865-604-6733


Recommended