+ All Categories
Home > Technology > Slides: EMC Data Value Tutorial

Slides: EMC Data Value Tutorial

Date post: 08-Feb-2017
Category:
Upload: john-furrier
View: 91 times
Download: 0 times
Share this document with a friend
51
1 © Copyright 2016 EMC Corporation. All rights reserved. INTRODUCTION TO DATA VALUE STEVE TODD, EMC FELLOW, VP OF STRATEGY AND INNOVATION JULY 6, 2016, BIS 2016 LEIPZIG GERMANY
Transcript
Page 1: Slides: EMC Data Value Tutorial

1© Copyright 2016 EMC Corporation. All rights reserved.

INTRODUCTION TO DATA VALUESTEVE TODD, EMC FELLOW, VP OF STRATEGY AND INNOVATIONJULY 6, 2016, BIS 2016 LEIPZIG GERMANY

Page 2: Slides: EMC Data Value Tutorial

2© Copyright 2016 EMC Corporation. All rights reserved.

1. Introduction to Data Value2. Data Value IT Architectures

SESSIONS

Page 3: Slides: EMC Data Value Tutorial

3© Copyright 2016 EMC Corporation. All rights reserved.

1. Introduction to Data Value- Introductions: What is Data Value?- San Diego Supercomputer Center- Product to Data Value- Industry Challenge- Emerging Use Cases- Audience Discussion

2. Data Value IT Architectures

SESSION1

Page 4: Slides: EMC Data Value Tutorial

4© Copyright 2016 EMC Corporation. All rights reserved.

DR. JIM SHORT, SAN DIEGO SUPERCOMPUTER CENTERARCHITECTING FOR VALUE

Valuation IT ArchitecturesValuation Business Processes

Page 5: Slides: EMC Data Value Tutorial

5© Copyright 2016 EMC Corporation. All rights reserved.

Analysis of the vast collections of GIS data we have or have access to is not merely generating new exploration, it has become a salable service on its own right.

— Australia Oil & Gas

CAPGEMINI EMC BIG DATA REPORTFROM PRODUCT VALUE TO DATA VALUE

Source: Forbeshttps://www.capgemini.com/news/new-global-study-by-capgemini-and-emc-shows-big-data-driving-market-disruption-leaving-many

Among our respondents, 63% consider that the monetization of data could eventually become as valuable to their organizations as their existing products and services.

— CapGemini EMC Big Data Report

Page 6: Slides: EMC Data Value Tutorial

6© Copyright 2016 EMC Corporation. All rights reserved.

BABOLATEXAMPLE: PRODUCT TO DATA VALUE

Page 7: Slides: EMC Data Value Tutorial

7© Copyright 2016 EMC Corporation. All rights reserved.

ADIDAS SMART BALL EXAMPLE: PRODUCT TO DATA VALUE

Page 8: Slides: EMC Data Value Tutorial

8© Copyright 2016 EMC Corporation. All rights reserved.

LANTMATERIETEXAMPLE: PRODUCT TO DATA VALUE

Page 9: Slides: EMC Data Value Tutorial

9© Copyright 2016 EMC Corporation. All rights reserved.

Source: WSJhttp://www.wsj.com/article_email/whats-all-that-data-worth-1413157156-lMyQjAxMTE1OTE1NDUxMDQ5Wj

companies have better accounting for their office furniture than their information assets

ARE WE GETTING A GOOD DEAL?

Page 10: Slides: EMC Data Value Tutorial

10© Copyright 2016 EMC Corporation. All rights reserved.

VALUATION BUSINESS PROCESSESM&A

CREDITORVALUATION

DATA INSURANCE DATAMONETIZATION

DATA SALE

Page 11: Slides: EMC Data Value Tutorial

11© Copyright 2016 EMC Corporation. All rights reserved.

DATA VALUE AND ACQUISITIONS

Source: https://press.linkedin.com/site-resources/news-releases/2015/linkedin-to-acquire-lyndacom

CALCULATIONS    $1,500,000,000 90% $1,350,000,000Acquisition cost per tutorial $9,269.24Acquisition cost per GB (uncompressed) $112.88

Lynda.com’s extensive library of premium video content helps empower people to develop the skills needed to accelerate their careers.

— Jeff Weiner, CEO of LinkedIn

Page 12: Slides: EMC Data Value Tutorial

12© Copyright 2016 EMC Corporation. All rights reserved.

UPDATE: MICROSOFT ACQUISITION

Source: http://www.wsj.com/articles/microsoft-to-acquire-linkedin-in-deal-valued-at-26-2-billion-1465821523

Sales representatives using Microsoft’s Dynamics software for managing customer relationships could pick up useful tidbits of background on potential customers from LinkedIn data. Microsoft also sees opportunities in Lynda.com, a channel for training videos that LinkedIn bought for $1.5 billion last year. Microsoft will be able to offer Lynda’s videos inside its own software, such as Excel spreadsheets.

— Wall Street Journal, June 14 2016

Page 13: Slides: EMC Data Value Tutorial

13© Copyright 2016 EMC Corporation. All rights reserved.

BANKRUPTCY

Source: http://www.wsj.com/articles/in-caesars-fight-data-on-players-is-real-prize-1426800166

DEFENDANT’S LOOTING OF CEOC’S VALUABLE OPERATING ASSETS

Date of Transfer

Asset Transferred

Conservative Estimated Equity Value

Equity Value Attributed

Equity Valuation Shortfall - $

Equity Valuation Shortfall - %

May 2014 Total Rewards $1.0BN None $1.0BN 100%Total $5.9BN $2.4BN $3.6BN 60%

Page 14: Slides: EMC Data Value Tutorial

14© Copyright 2016 EMC Corporation. All rights reserved.

• 23andMe– 800,000 Customer DNA Kits since 2006– $99 per test ($79.2 million)

• Genentech– Upfront payment of $10 million– Further milestones as much as $50 million

DATA MONETIZATION

Source: Forbeshttp://www.forbes.com/sites/matthewherper/2015/01/06/surprise-with-60-million-genentech-deal-23andme-has-a-business-plan/

…this single deal with one large drug company could generate almost as much revenue as doubling 23andMe’s customer base.

— Forbes Article

Page 15: Slides: EMC Data Value Tutorial

15© Copyright 2016 EMC Corporation. All rights reserved.

DATA SALE

http://adexchanger.com/ecommerce-2/tesco-eyes-sale-of-dunnhumby-its-nearly-1-billion-shopper-data-business/

Tesco said it has appointed Goldman Sachs as its adviser to explore “strategic options” for the US$756 million business

[dunnhumby]…has a unique frame of reference on the purchase habits of 770 million shoppers

Page 16: Slides: EMC Data Value Tutorial

16© Copyright 2016 EMC Corporation. All rights reserved.

DATA INSURANCE

Liberty Mutual• 30% increase in primary data insurance

policies between 2013-2014

Source: Boston Globe http://www.bostonglobe.com/business/2014/02/17/more-companies-buying-insurance-against-hackers-and-privacy-breaches/9qYrvlhskcoPEs5b4ch3PP/story.html

TJX• 46 million credit/debit cards• Estimated cost $180 million

Page 17: Slides: EMC Data Value Tutorial

17© Copyright 2016 EMC Corporation. All rights reserved.

• What data-related business processes add revenue to the bottom line?

• What data-related business processes subtract costs from the bottom line?

• What characteristics of data can be used to calculate value?

AUDIENCE DISCUSSION

Page 18: Slides: EMC Data Value Tutorial

18© Copyright 2016 EMC Corporation. All rights reserved.

• Sell• Rent (or provide data services)• Monetize (analyze to increase revenue or cut costs)• Data/Cyber Insurance claim

COMPARE YOUR ANSWERS

• Cost to process data• Cost to store data• Cost of premiums for data/cyber insurance• Cost to purchase a data asset• Cost to acquire a company’s data• Regulatory fines for data violations• Data science staff

PLUS

MINUS

Page 19: Slides: EMC Data Value Tutorial

19© Copyright 2016 EMC Corporation. All rights reserved.

1. Introduction to Data Value2. Data Value IT Architectures

– Introduction to Data Lakes• Data Lake Architecture• Data Lake Industry use cases

– Adding Valuation to Data Lake Architectures

SESSION2

Page 20: Slides: EMC Data Value Tutorial

20© Copyright 2016 EMC Corporation. All rights reserved.

MORE DATA MEANS MORE COMPLEX RELATIONSHIPSTO ANALYZE IN REAL-TIME AT A LARGE SCALE

Amount of Data

4.4 Zb

2013 2020Source: IDC 2014

16+ ZbHot Data

44 Zb

The Data Multiplier Effect

Business

Human

Machine

1X 10X 100XDatabase Data

VOLUMEVARIETYVOLUME

VARIETYVOLUMEVELOCITY

Enterprise/External Data

Sensor/External Data

Satellite Imaging

Sensors

Video RecordingM2M Log

Files

Bio-Informatics

Email Documents

Web Logs Social

• More Data Needs To Be Captured Faster

• Real-time Analytics For Business Insights

• Existing Applications Are Taxed

• Evolving New Applications & Architectures

Page 21: Slides: EMC Data Value Tutorial

21© Copyright 2016 EMC Corporation. All rights reserved.

BRING TOGETHER DATA, ANALYTICS, & APPS

ANALYZE ANYTHINGAll of the dataMore sophisticated analysesNew combinations and correlations

STORE EVERYTHINGStructured, unstructured, darkGenerated by the enterprise, imported from outsideHistoric & real-time

SPEEDANALYTICS

APPS

DATA

BUILD THE RIGHT THINGDeliver data consistently & in a standardized way Get at the data quicklyBuild views and applications each user really needs

Page 22: Slides: EMC Data Value Tutorial

22© Copyright 2016 EMC Corporation. All rights reserved.

INGEST STORE ANALYZE SURFACE ACT

Ingest data in real-time,

near real-time batch/micro-

batch.

Open HDFS storage allows

access from the full-stack of analytics

tools.

Apply the latest

machine learning and data science techniques.

An open platform for visualization

of results and data products.

And an application

development platform to

act on findings.

DATA LAKE - ATTRIBUTES

Page 23: Slides: EMC Data Value Tutorial

23© Copyright 2016 EMC Corporation. All rights reserved.

Page 24: Slides: EMC Data Value Tutorial

24© Copyright 2016 EMC Corporation. All rights reserved.

Data Streaming Reference ArchitectureData Feeds Transactional Apps Analytic Apps

Data Stream Pipeline

DistributedComputing Real-Time Data Expert Systems &

Machine LearningAdvancedAnalytics

HDFSData Lake

Page 25: Slides: EMC Data Value Tutorial

25© Copyright 2016 EMC Corporation. All rights reserved.

FEDERATION BUSINESS DATA LAKE PLATFORMDATA & ANALYTICS CATALOG

(THIRD PARTY APPLICATIONS)

HADOOP

OPEN DATA PLATFORM

PIVOTAL BIG DATA SUITEADVANCED ANALYTICS

DATA PROCESSING

APPS AT SCALEGREENPLUM DATABASE HAWQ

PIVOTAL HDSPARKSPRING XD

DATA SERVICES MANAGEMENT

ANALYTICS TOOLBOX

REDIS

RABBITMQ

GEMFIRE

BDS ON PIVOTAL

DATAMANAGER

DATAGOVERNOR

INGEST

INDEX & SEARCH

POLICYMGMT

SECURITY & ACCESS

CONTROL

VIRTUALIZATION PIVOTAL CLOUD FOUNDRY

EMC II STORAGEDATA LAKE FOUNDATION: ISILON | ECS

VCE VBLOCK | XTREMIO

Page 26: Slides: EMC Data Value Tutorial

26© Copyright 2016 EMC Corporation. All rights reserved.

DATA LAKE ACCESS METHODS

FILE

HPC

Backup/Archive

Analytics

Mobile

File Shares

Cloud Apps

FILE

26© Copyright 2015 EMC Corporation. All rights reserved.

Page 27: Slides: EMC Data Value Tutorial

27© Copyright 2016 EMC Corporation. All rights reserved.

DATA: ANALYTICS-READY STORAGE CHOICES

ISILON, ECS Scale compute & storage independently HDFS-enable existing data No single point of failure Easily import & export via next-gen communication;

including HDFS, S3, Swift and Atmos API support Fault-tolerant, end-to-end data protection Self-service provisioning Storage hardware choice: enterprise, commodity

CONSOLIDATE DATA STORAGE THROUGH MULTI-PROTOCOL ACCESS

Real

tim

eBa

tch

Hadoop

Analytics

Surface

ActCloud

Archive

Mobile

HPC

Shares

Page 28: Slides: EMC Data Value Tutorial

28© Copyright 2016 EMC Corporation. All rights reserved.

DATA LAKE ARCHITECTURE/USE CASES

……

.

8 PetabytesGenome

Sequenzed Data

• 20.000 Oncology Samples per Year• 40-50GB per Sequence Session• Historical Data for Cancer Analysis

Page 29: Slides: EMC Data Value Tutorial

29© Copyright 2016 EMC Corporation. All rights reserved.

Mission statement

“To provide knowledge, services and solutions to fulfil Radboudumc research ICT needs, in a way that fits the individual study. The solutions includes a Digital Research Environment ( DRE) which allows researchers to import, merge, optimize, store, analyse, archive and share data from various sources (local and (inter)national) in a single scalable digital environment per study. The use of this digital environment increases research study efficiency and output, thereby increasing scientific impact of the Radboudumc. The sustainable, secure, law compliant infrastructure places the Radboudumc in a key position as scientific partner.”

The mission

Page 30: Slides: EMC Data Value Tutorial

30© Copyright 2016 EMC Corporation. All rights reserved.

DRE positioning

Education

Digital Learning Environment (DLE)

Care

Electronic Medical Record (EMR)

Research

Digital Research Environment (DRE)

Page 31: Slides: EMC Data Value Tutorial

31© Copyright 2016 EMC Corporation. All rights reserved.

DRE-mandate: valorize the new EMR ‘Epic’

EMR

DRE

‘Paddy Field’

Page 32: Slides: EMC Data Value Tutorial

32© Copyright 2016 EMC Corporation. All rights reserved.

DRE

Import

Merge

Optimize

StoreAnaly

se

Archive

Share

DRE-mandate: increase efficiency and output

Page 33: Slides: EMC Data Value Tutorial

33© Copyright 2016 EMC Corporation. All rights reserved.

Datamanagementat this moment

© Caspar Terheggen, Radboud Universiteit

& LAW

DRE-mandate: modernize

Page 34: Slides: EMC Data Value Tutorial

34© Copyright 2016 EMC Corporation. All rights reserved.

© Caspar Terheggen, Radboud Universiteit

Virtual research workspace

DRE-mandate: modernize

Page 35: Slides: EMC Data Value Tutorial

35© Copyright 2016 EMC Corporation. All rights reserved.

Local key management

Uitwisseling collega’sSharing

High performance computing

Data ponds

Secure Research Environment

Standards

Analysis & Reporting

Archiving

Multi centersources

Source disclosure

Pseudonymisation Merging

DRE: study example

Page 36: Slides: EMC Data Value Tutorial

36© Copyright 2016 EMC Corporation. All rights reserved.

Hybrid CloudPrivate Public

Platform as a ServiceLight Opensource Frameworks, Services, Data and Analytics

Orchestration Automated Provisioning and IT Infrastructure Portal

Converged Infrastructure IT Transformations to Service Delivery

Standardize, Virtualize, Automate

Research as a ServiceBuilding new apps for competitive advantage in market, the

new business

Research Self Service PortalEnd User Self Service with Measured SLA’s

Vendor inventory: architecture

Page 37: Slides: EMC Data Value Tutorial

37© Copyright 2016 EMC Corporation. All rights reserved.

BITBW (State SP for State of Baden-Württemberg) – EMC ECS-Electronic Record Archiving in Criminal Justice

Department– Cooperation with ISV and their SW (PDV Systeme)– Starting 1st Jan 2018 Electronic Comm. between Lawyers and Justice Courts will be fully electronical without any Media Break– Justice Courts are fully digital

Page 38: Slides: EMC Data Value Tutorial

38© Copyright 2016 EMC Corporation. All rights reserved.

I. Traditional / Horizontal Use Cases– Email Archives– File Shares / Home Directories– VDI (user data)– vCAD Workstation Virtualization– Backup + Archive– Video Surveillance

II. Engineering Use Cases– Computer-Aided Design– Computer-Aided Engineering– Advanced Driver Assistance Systems, ADAS

III. Emerging Use Cases– Hardware in the Loop / Simulation– Connected Cars– Analytics

IT IN AUTOMOTIVE

Source: Pictures by Bosch

Page 39: Slides: EMC Data Value Tutorial

39© Copyright 2016 EMC Corporation. All rights reserved.

Car Supplier - Typical Workflow

Labeling

SIL-HIL computing

Data Ingest

Label-data

Validation

Developer

Tape library

Import,Cut&Compress

144*

X400

SIL = Software in the LoopHIL = Hardware in the Loop

Page 40: Slides: EMC Data Value Tutorial

40© Copyright 2016 EMC Corporation. All rights reserved.

CAR SUPPLIER INFRASTRUCTURE (GLOBAL)

144*X400

3*X400

Plymouth

Germany

IP

California

Replication of selected data over Aspera *3*X400

Replication of selected data over Aspera *

Page 41: Slides: EMC Data Value Tutorial

41© Copyright 2016 EMC Corporation. All rights reserved.

Typical HiL Environment for adas develoment

SMB / NFS

Write (Ingest)

Read (simulation)

2PB – 20 PB HiL Server

Page 42: Slides: EMC Data Value Tutorial

42© Copyright 2016 EMC Corporation. All rights reserved.

VALUATION APPROACHES

ApplicationAgility

Content Workflow

ContentProcessing

Data Protection Ecosystem

ContentIngest

Page 43: Slides: EMC Data Value Tutorial

43© Copyright 2016 EMC Corporation. All rights reserved.

CONTENT PROCESSING

NLP, Translation, Stemming, Tokenization

Domain A Domain B Domain C Domain D

Valuation Algorithms

Page 44: Slides: EMC Data Value Tutorial

44© Copyright 2016 EMC Corporation. All rights reserved.

DATA PROTECTION ECOSYSTEM

Backup Schedule/Catalog

Backup Data

Valuation Algorithms

Mappings Between Primary/Protection

System

P1 B1

P2 B2

Schedule Num Copies

(x)

Catalog V1

V2

V3

Page 45: Slides: EMC Data Value Tutorial

45© Copyright 2016 EMC Corporation. All rights reserved.

CONTENT INGESTSpout

1Spout

2

Bolt 2

Bolt 1

Bolt 3

Bolt 4

Page 46: Slides: EMC Data Value Tutorial

46© Copyright 2016 EMC Corporation. All rights reserved.

APPLICATION AGILITY

APPS

DATA

SPEED

ANALYTICS

Zero Downtime Upgrade to Production

Commit Code

Change

1

Automate Build & Test

(Unit Test, Static Code Analysis)

2

Store Binaries &

Build Artifacts

3

Automated Integration

Testing

4

Acceptance,Performance

& Load

5 6

Page 47: Slides: EMC Data Value Tutorial

47© Copyright 2016 EMC Corporation. All rights reserved.

CONTENT WORKFLOW

End User End User End User

Driver

Source

Driver DriverDriver

Driver DriverDriver

Source Source Source Source Source

Driver DriverDriver

Driver Driver

Driver Driver

Page 48: Slides: EMC Data Value Tutorial

48© Copyright 2016 EMC Corporation. All rights reserved.

DATA SCIENCE EXAMPLE

Final Report & Business Recommendation$29M

Labeled DiskArrayData

Labeled Disk ArrayDataDisk array data where each disk drive is being assigned to a label indicating failure or activity

DiskArrayData

Product_Tables

Customer History Table

Geneaology

GPO GeneaologyProvides a listing of all items in venwith tracking of their

TCE History TableProvides history tracking for drives & other parts in vendor equipment

Product TablesThe product table contains information at the part item number level, i.e., configuration parameters & meta data

Disk Array DataData collected from arrays installed at customer sitesContains many places of information regarding configuration & parts as well as error information for disk drives

DiskArray

Enriched w/EMC

SN

Data Scientist: John SmithTool: Greenplum DBActions: Identify failed from non-failed drives

Disk Array DataEnriched with EMC SNDisk array data where each row disk drive serial number is mapped to WMC internal disk drive serial number

Data Scientist: John SmithTool: Greenplum DBActions: Join the product meta data with the drive data

Data Scientist: John SmithTool: Greenplum DBActions: Map serial numbers

Data Scientist: John SmithTool: Greenplum DBActions: Identify disk drives & map raw drive serial numbers to serial numbers

Page 49: Slides: EMC Data Value Tutorial

49© Copyright 2016 EMC Corporation. All rights reserved.

• Async replication is never perfect– Inherent data lag between production and replica – This is unprotected data; will be lost in disaster– Data loss drives monetary loss– We aim to minimize monetary loss in a case of disaster

• Optimize replication resources for minimize data loss’s costs

VALUE-DRIVEN DISASTER RECOVERYMINIMIZE MONETARY LOSS IN DISASTER

3 111 219 327 435 543 651 759 867 975 1083 1191 1299 1407 1515 1623 1731 1839 1947 2055 2163 2271 2379 2487 2595 2703 2811 2919

ASSUME DATA A IS X TIMES MORE IMPORTANT THAN APP B

Non optimized Business optimized (*2) Business optimized (*4) Business optimized (*10)

Time (secs)

BusinessDamage in

Disaster

(x=2)(x=4)

(x=10)Peleg Yiftachel, Udi Shemer, Omer Sagi

Page 50: Slides: EMC Data Value Tutorial

50© Copyright 2016 EMC Corporation. All rights reserved.

Learn about the value, opportunity, and insights that Big Data provides. Get introduced to the Federation Business Data Lake solution to leverage the full power of big data to drive major business strategies.https://educast.emc.com/learn/data-lakes-for-big-data-archive-2015 http://stevetodd.typepad.com

EMC HELPS WITH MOOC, BLOGMASSIVE OPEN ONLINE CURRICULUM

Page 51: Slides: EMC Data Value Tutorial

Recommended