+ All Categories
Home > Technology > The Logical Data Warehouse and Big Data

The Logical Data Warehouse and Big Data

Date post: 16-Apr-2017
Category:
Upload: cisco-data-center
View: 1,809 times
Download: 3 times
Share this document with a friend
29
David Besemer CTO, Cisco Data Virtualization The Logical Data Warehouse and Big Data Jan Afridi Director, MapR
Transcript
Page 1: The Logical Data Warehouse and Big Data

David Besemer CTO, Cisco Data Virtualization

The Logical Data Warehouse and Big Data

Jan Afridi Director, MapR

Page 2: The Logical Data Warehouse and Big Data

2 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Source: Cisco Consulting Services

Data is Massive, Messy, and Everywhere

Page 3: The Logical Data Warehouse and Big Data

3 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Business users are dissatisfied with traditional data warehouse

New analytic requirements have driven new analytic tools

Big data analytics have driven Hadoop and other specialized databases

Data is not only owned and managed by IT but dispersing across many departments

Data virtualization’s ability to easily access and federate data

The reign has ended for the enterprise data warehouse as the singular best practice for large scale information management

Page 4: The Logical Data Warehouse and Big Data

4 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Enterprise Data Sources

Business Intelligence

Traditional Data Sources •  One place to go for data •  Suitable for business use

Traditional Data Warehouse

Page 5: The Logical Data Warehouse and Big Data

5 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Enterprise Data Sources

Business Intelligence

Traditional Data Sources Cloud Data Sources Big Data / IoT Sources

Page 6: The Logical Data Warehouse and Big Data

6 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Enterprise Data Sources

Business Intelligence Analytics Self-Service ESBs and Apps

Traditional Data Sources Cloud Data Sources Big Data / IoT Sources

How can the business effectively leverage all of its data?

Page 7: The Logical Data Warehouse and Big Data

7 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Enterprise Data Sources

Business Intelligence Analytics Self-Service ESBs and Apps

Abstraction Caching Directory Federation Security Governance Transformation

Cisco Data Virtualization

Traditional Data Sources Cloud Data Sources Big Data / IoT Sources

Page 8: The Logical Data Warehouse and Big Data

8 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Business Intelligence Analytics Self-Service ESBs and Apps

Abstraction Caching Directory Federation Security Governance Transformation

Cisco Data Virtualization

Logical Data Warehouse

Enterprise Data Sources Traditional Data Sources Cloud Data Sources Big Data / IoT Sources

• Suitable for business use • One place to go for data

Page 9: The Logical Data Warehouse and Big Data

9 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Inspired by Gartner’s Logical Data Warehouse

Taxonomy/Ontology Resolution

And / Or And / Or

SLA Requirements

Auditing and Management

Statistics

Repositories

<>/=/~

Metadata

DQ

, MD

M, G

ov. Use

-Cas

e A

cces

s S

eman

tics

DQ = Data Quality, MDM = Master Data Management, Gov. = Governance

Virtualization Distributed Processing

Page 10: The Logical Data Warehouse and Big Data

10 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Data Warehouse Optimization (DWO)

Offload data to less expensive Hadoop

cluster to save on data management costs

2

As data volume increases, cost of warehousing

grows substantially

Add operational data for greater insight and agility

in analytics and BI 4

Data Virtualization Platform

1

Combine Hadoop data with DW data for a more comprehensive

view of history

3

HDFS HDFS HDFS

Page 11: The Logical Data Warehouse and Big Data

11 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

Redefining Risk Analysis

$1.7 Million Productivity Savings

Quickly completed integration not possible any other way

Overcome Complex Data Challenges Improve Data Access with Data Virtualization •  Integrated online data (cloud), market data (DW) and historical

data (tapes transferred to Hadoop) •  Improved performance of data access to over 50 sources •  Increased availability and accessibility of data for analysis

Over $1 Million Business Value Acceleration

Page 12: The Logical Data Warehouse and Big Data

12 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

•  Identify small BI/analytics project •  Assess DW and data environment •  Implement DV across 2-3 data sources •  Deliver best practices operational guidelines •  MapR + Cisco DWO Quick Start Solution

Quick Start

Cisco’s Offer Detail

•  Timeframe: 3-6 months

•  Timeframe: Depends on Customer Environment

•  Timeframe: Depends on Customer Environment

•  Implement Quick Start if not already done so •  Identify BI/analytics deployments that could benefit

from additional data sources; and/or •  Identify where DV views could be used in

additional BI/analytics apps •  Deploy and optimized SW, HW, and networking

infrastructure •  MapR + Cisco DWO Quick Start Solution

•  Implement Phase 1 as needed •  Extend DV with Business Directory to provide data

access to business users •  Implement workload management (TES)

to move towards automation •  Deploy and optimized SW, HW, and

networking infrastructure •  MapR deployment with multi-tenancy and advanced

security

•  Implement Phases 1 and 2 as needed •  Implement data prep capability to provide closed-

loop data collaboration and management between IT and business users

•  Deploy and optimized SW, HW, and networking infrastructure

•  MapR deployment with back up and business continuity configured

Timeframe: 3-6 months

Timeframe: Depends on Customer Environment

Timeframe: Depends on Customer Environment

Timeframe: Up to 8 weeks

1

Phase

2

Phase 3

Phase

Page 13: The Logical Data Warehouse and Big Data

13 © 2015 Cisco and/or its affiliates. All rights reserved. Cisco Confidential

The DW’s future is not a single physical location, but rather a hybrid of physical and virtual data stores.

The Future

Gartner’s LDW is only an ideal target. It is an aspiration, not a recipe for every organization.

Gartner Target

Summary

Provides a pragmatic phased approach to achieving the end goal.

Cisco and MapR’s Phased Approach

Perpetuating the status quo is only costing enterprise in time and money. Taking small steps toward an LDW will have long-term benefits.

Status Quo Too Costly There is not just one finish line. Achieving milestones along the way to the Gartner target can reap measurable benefits for the enterprise.

More Than One Finish Line

Page 14: The Logical Data Warehouse and Big Data
Page 15: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 15

®

© 2014 MapR Technologies

MapR & Outcomes/Use Cases •  Cost Reduction (2) •  Threat Prevention (1) •  Fraud Detection (5) •  Revenue Generation (6)

Jan Afridi

October 22, 2015

Page 16: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 16

SEMI-STRUCTURED DATA

STRUCTURED DATA

1980 2000 2010 1990 2020

Implications of Data Doubling Every Two Years

Source: Human-Computer Interaction & Knowledge Discovery in Complex Unstructured, Big Data

Total Data S

tored

IT Budgets

!  “Data is the new bullet in business” ! Biggest re-platforming since RDB in 80s ! Data Scientists are the new Rock Stars ! Machine intelligence driven by data will dominate productivity gains

Page 17: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 17

Cost Reduction: Push Messaging Platform Enabling the “smartest, most aware, precise, easy-to-use, scalable, secure and powerful push messaging platform on the planet"

•  Enable organizations to build one-on-one brand relationships •  Push messaging and geo-location targeting that

•  Support large numbers of customers in a multi-tenant platform •  Target specific consumers in real time with relevant offers •  Increase reliability of push messaging while lowering data center costs

OBJECTIVES

CHALLENGES

SOLUTION

•  Increasing engagement and customer loyalty for 100’s of leading brands •  Reduced hardware footprint by 50% •  Consolidated 8 Hadoop clusters into 1 MapR cluster

Business Impact

•  MapR Distribution for Hadoop with Apache HBase for operational workloads •  Data placement control enables efficient cluster resource management

Page 18: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 18

!  MapR Streaming writes eliminated Cassandra staging costs completely !  “To get data in and out of Hadoop, you have to do some kind of HDFS export. With MapR, you can

just mount [HDFS] as NFS and then use native tools whether they’re in Windows, Unix, Linux or whatever.” Mike Brown, comScore CTO

Cost Reduction: Internet Analytics and Ad Optimization comScore delivers insights about online consumer behavior

•  Provide digital analytics services—syndicated and custom solutions in audience measurement, e-commerce, advertising, search, video & mobile

•  Keeping up with the growing volume of data. In past 5 years, comScore’s volume of new data/month has grown from 100 billion to 1.7 trillion records

•  comScore chose MapR for NFS, performance, operational efficiency •  MapR processes over 1.7 trillion Internet and mobile records/month, reaching more

than 90% of the Internet population

OBJECTIVES

CHALLENGES

SOLUTION

Business Impact

Page 19: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 19

Cisco was able to store and analyze years of data at a fraction the cost and across all network, application, and edge security logs. Found many net new threats in context across silos.

Threat Prevention: Cisco: Threat Intrusion Prevention (TIP) Cisco uses MapR Data Lake for Internet Threat Analytics

•  Create shared view of customer & operations across 75,000 employees •  Increase revenue opportunities with sales partners

•  Security logs as silo’d in different divisions •  Storage for longer periods of analysis is too expensive •  Hidden threats over time “low and slow” could not be made visible

•  Use MapR to collect device information over long periods of time. •  Use Hadoop analytic tools to •  Generate new sales leads internally and for partners

OBJECTIVES

CHALLENGES

SOLUTION

Business Impact

Page 20: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 20

!  Ability to capture approximately $200 million of fraud, waste and abuse !  Transition to a more optimal pre adjudication process from a pay first, ask later scenario !  Multiple business units supported through Big Data Platform as a Service

Fraud Detection: Reducing Payment Waste and Abuse F100 health provider captures incremental $$ in payment fraud and waste

•  Capture incremental revenue by reducing payment fraud in claims data •  Reduce claim payment errors •  Inability to recover monies distributed due to lack of pre-adjudication. •  Issues with performing claim pre-adjudication •  Current process to identify fraud, waste and abuse (FWA) is manual

•  Seamless ingestion of various data sources into a data lake: NFS •  Near real-time adjudication of claims: Machine learning •  Support multiple programs and applications: Multi-tenancy

OBJECTIVES

CHALLENGES

SOLUTION

Business Impact

HEALTH CARE PROVIDER

Page 21: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 21

Fraud Detection: Zions Bank Cost effective security analytics and fraud detection on one platform

•  Fraud Operations and Security Analytics team at Zions maintains data stores, builds statistical models to detect fraud, and then uses these models to data mine and evaluate suspicious activity

“We initially got into centralizing all of our data from an information security perspective. We then saw that we could use this same environment to help with fraud detection”

Michael Fowkes - SVP Fraud Operations and Security Analytics

•  Existing technology infrastructure could not scale •  Timeliness of reports degraded over the last several years

•  Chose MapR and cut storage costs by 50% •  Querying time reduced from 24 hours to 30 min on 1.2 PB of data •  Leverage MapR scale for increased model accuracy and deeper insights

OBJECTIVES

CHALLENGES

SOLUTION

Business Impact

Page 22: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 22

Fraud Detection: American Express Real-time Analysis to Limit Fraudulent Purchases – Race against Time

•  Limit purchases at point of sale and minimize continued fraudulent use per event OBJECTIVES

SOLUTION

Business Impact

Page 23: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 23

!  Approximately 20 % reduction in fraud and leakage of govt aid programs($50B) !  Average citizen’s life is transformed as they can get access to various stipulated benefits !  845 million citizens currently enrolled providing identity for approx. 60% of the population

Fraud Detection: e-AADHAR World’s Largest Biometric Database Indian government creates biometric identification system for all citizens

•  Increase % of citizens who have bank accounts and can access benefits •  Reduce corruption and fraud in government aid programs

•  Issues with data replication and loss across clusters in competing distribution •  Weak disaster recovery strategy in competitive distribution •  Complicated upgrade process and high availability issues

•  Complete data backup: Snapshots and mirroring •  Lower maintenance overhead: Rolling upgrades •  Fingerprints and retina scans with 200 ms response: MapR-DB tables

OBJECTIVES

CHALLENGES

SOLUTION

Business Impact

Page 24: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 24

!  2% = $1B in new revenue by re-targeting customers after leaving a website/shopping cart !  Improved merchandising operations by optimizing local inventory and pricing !  50 different applications supported including recommendations, clickstream and SCM

Revenue Generation: Merchandising and Retail Operations F5 retailer accelerates revenue growth from existing customers

•  Increase spend per customer and customer visit, online and offline •  Increase efficiency in replenishment and overall merchandising operations

•  Competitive distribution stores everything in JVM; not stable and safe •  Separate clusters for each team and application being developed •  Ability to support large volume of files

•  Seamless Hadoop file movement & management: MapR NFS •  Scaling across the business without losing data: MapR volumes •  Support 50 different teams and applications: Multi- tenancy

OBJECTIVES

CHALLENGES

SOLUTION

Business Impact

5 RETAILER TOP

WORLDWIDE

Page 25: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 25

Revenue Generation: Recommendation Engine & Real-time Targeting Targeting credit card customers with personalized real-time offers

•  Increase revenue and customer loyalty through real-time personalized offers

•  Increases revenue and improves customer experience through real-time targeting •  A more flexible, scalable platform that’s a fraction of the cost of traditional technologies •  Ensures reliability with MapR high availability on 135 applications in production on 1 cluster

•  Developers and analysts are unable to access all customer data •  Many different CRM tools and siloed targeting engines •  Required better reliability, performance, and ability to stream real-time data •  Want to increase speed and true personalization of recommendations

•  MapR Enterprise DB Edition centralizes analytics and operational apps on one platform

•  Integrates all customer online and offline data into one repository in real-time: card member spend graph, merchant data, location, and feedback

•  Uses Mahout machine learning to provide real-time personalized offers

OBJECTIVES

CHALLENGES

SOLUTION

Business Impact

Page 26: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 26

“We can break down data silos and store all interactions in one data store. This allows our customers and businesses to mine that data and do inter-application cross-trend analysis.” IT Director, Digital Media & Communications Provider

Revenue Generation: Clickstream Analysis Digital media provider improves targeting and relevancy

•  Expand advertising programs to accommodate over 65 million devices •  Roll out new national services across local cable networks

•  Complex campaign rules, demographics, device proliferation •  Track user interactions (skip, forwards, channel changes) •  Perform millions of ad insertions in parallel within milliseconds •  Oracle unable to meet the needs of this program

•  MapR Hadoop to collect all user demographic interaction data •  MapR M7 with Storm to handle real-time requirements for ad service

and event streaming •  Scalable unified platform provides lower TCO and operational overhead

OBJECTIVES

CHALLENGES

SOLUTION

DIGITAL MEDIA & COMMUNICATIONS

PROVIDER

Business Impact

Page 27: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 27

Revenue Generation: NASCAR Recommendation Engine & Real-time Targeting Making personalized real-time offers Fans through Social Media Data

•  Increase revenue and customer loyalty with real-time personalized offers

•  Increase in revenue and improves customer experience through real-time targeting •  A more flexible, scalable platform that’s a fraction of the cost of traditional

technologies

•  Insights with Data Overload - $6K social posts/minute = 500K+/race •  18 million mentions and 8 million conversations/weekend •  50K Tweets/race over 85 races/year

•  MapR M7 centralizes analytics and operational apps on one platform •  Captures crowd source Social Media allowing call to action on insights •  Centralized season long circuit history providing more accurate insights •  Leverages existing Data Warehouse offering in place

OBJECTIVES

CHALLENGES

SOLUTION

Business Impact

Page 28: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 28

!  Increase in Website engagement and conversion with more relevant offers !  Improved customer satisfaction and reduced churn !  Higher performance (35-60% faster) & reliability for customer-facing applications

Revenue Generation: Improving the Customer Experience Large brokerage increases customer acquisition rate and improves customer service

•  Increase conversion rate of online visitors to customers •  Provide real-time response for call center agents to service customers

•  Inability to provide customers with right content through right channel •  Lack of real-time 360 view into customers •  Operational issues with prior Hadoop distribution

•  Easy ingestion of customer, application and clickstream data with NFS •  Deeper insights to customer preferences across web, mobile, call center •  Real-time contextual offers to clients using MapR-DB •  Real-time streaming application for customer support with Storm on MapR

OBJECTIVES

CHALLENGES

SOLUTION

Business Impact

LARGE BROKERAGE

Page 29: The Logical Data Warehouse and Big Data

®© 2014 MapR Technologies 29

Revenue Generation: Rubicon Project: Ad Optimization Rubicon Project runs a real-time automated advertising platform

•  Create open ad platform for over 100K global advertising brands and over 500 of the world’s premium publishers

•  To keep up with their rapid growth, they needed to move to a fault-tolerant, high-availability Hadoop production system

•  Hadoop had become central to their operations but they were having problems with instability

•  Their 330-node Hadoop cluster processes 1M records/second •  They chose MapR for enterprise features such as high availability, data protection

and recoverability, disaster recovery, redundancy, and support

OBJECTIVES

CHALLENGES

SOLUTION

“Our company cannot run without Hadoop and MapR. We rely on MapR’s self-healing HA, disaster recovery and advanced monitoring features to conduct 100 billion real-time auctions on our global transaction platform.” Jan Gelin, VP of Engineering, Rubicon Project

Business Impact


Recommended