+ All Categories
Home > Documents > watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data(...

watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data(...

Date post: 27-May-2020
Category:
Upload: others
View: 14 times
Download: 0 times
Share this document with a friend
37
https://www.youtube.com/watch?v=-Gj93L2Qa6c
Transcript
Page 2: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

• Topics: – Foundation of Data Analytics and Data Mining

– Data Volume, Velocity, & Variety

– Harnessing Big Data

– Enabling technologies: Cloud Computing

2

Page 3: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

No single definition.

From Wikipedia:

Term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.

◦ Challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization.

3

Page 4: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Trend to larger data sets due to additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with same total amount of data, allowing correlations to be found to:

◦ Spot business trends

◦ Determine quality of research

◦ Prevent diseases

◦ Link legal citations

◦ Combat crime

◦ Determine real-time roadway traffic conditions

4

Page 5: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

5

Page 6: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Data Volume ◦ 44x increase from 2009 to 2020 ◦ From 0.8 zettabytes to 35zb

Data volume increasing exponentially

6

Exponential increase in collected/generated data

Page 7: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

12+ TBs

of tweet data every day

25+ TBs of log data

every day

? TBs

of

data

every

day

2+ billion

people on the

Web by end

2011

30 billion RFID

tags today (1.3B in 2005)

4.6 billion

camera phones

world wide

100s of millions

of GPS enabled

devices sold

annually

76 million smart

meters in 2009… 200M by 2014

Page 8: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Maximilien Brice, © CERN

CERN’s Large Hydron Collider (LHC) generates 15PB/yr

Page 9: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

World's largest science project.

Tracks North America's geological evolution.

Observes and records data over 3.8 million square miles.

Amasses 67 terabytes of data. Analyzes seismic slips in the San Andreas fault, sure, but also the plume of magma underneath Yellowstone and much, much more.

• http://www.msnbc.msn.com/id/44363598/ns/technology_and_science-future_of_technology/#.TmetOdQ--uI

Page 10: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data

◦ Social Network, Semantic Web (RDF), …

Streaming Data ◦ You can only scan the data once

A single application can be generating/collecting many types of data

Big Public Data (online, weather, finance, etc)

10

To extract knowledge all these types of data need to linked together

Page 11: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

A Single View to the Customer

Customer

Social Media

Gaming

Entertain

Banking

Finance

Our

Known

History

Purchase

Page 12: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Data generated fast. Needs to be processed fast Online Data Analytics

Late decisions missing opportunities Examples

◦ E-Promotions: Based on your current location, your purchase history, what you like send promotions right now for store next to you

◦ Healthcare monitoring: sensors monitoring your activities and body any abnormal measurements require immediate reaction

12

Page 13: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

No longer hindered by ability to collect data

But, by ability to manage, analyze, summarize,

visualize, and discover knowledge from collected data

◦ In a timely manner and scalable fashion

13

Social media and networks (all of us are generating data)

Scientific instruments (collecting all sorts of data)

Mobile devices (tracking all objects all the time)

Sensor technology and networks (measuring all kinds of data)

Page 14: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Real-Time Analytics/Decision Requirement

Customer

Influence Behavior

Product Recommendations that are Relevant

& Compelling

Friend Invitations to join a

Game or Activity that expands

business

Preventing Fraud as it is Occurring & preventing more

proactively

Learning why Customers Switch to competitors and their offers; in time to Counter

Improving the Marketing

Effectiveness of a Promotion while it

is still in Play

Page 15: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

15

Page 16: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

OLTP: Online Transaction Processing (DBMSs)

OLAP: Online Analytical Processing (Data Warehousing)

RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)

16

Page 17: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

17

Old Model: Few companies generating data, all others consuming data

New Model: All of us generating data. All of us consuming data

Page 18: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

18

- Ad-hoc querying and reporting - Data mining techniques - Structured data, typical sources - Small to mid-size datasets

- Optimizations and predictive analytics - Complex statistical analysis - All types of data, and many sources - Very large datasets - More of a real-time

Page 19: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Big Data: Batch Processing & Distributed Data

Store Hadoop/Spark;

HBase/Cassandra

BI Reporting OLAP &

Dataware house

Business Objects, SAS, Informatica, Cognos other SQL Reporting

Tools

Interactive Business

Intelligence & In-memory

RDBMS QliqView, Tableau, HANA

Big Data:

Real Time & Single View

Graph Databases

1990’s 2000’s 2010’s

Speed

Scale

Scale

Speed

Page 20: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Big data is more real-time in nature than traditional DW applications

Traditional DW architectures (e.g. Exadata, Teradata) are not well-suited for big data apps

Shared nothing, massively parallel processing, scale out architectures are well-suited for big data apps

20

Page 21: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social
Page 22: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

22

Page 23: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

IT as a service ◦ Compute, storage, databases, queues

Clouds leverage economies of scale of commodity hardware ◦ Cheap storage, high bandwidth networks &

multicore processors

◦ Geographically distributed data centers

Offerings from Microsoft, Amazon, Google, …

Page 24: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

wikipedia:Cloud Computing

Page 25: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Cost & management ◦ Economies of scale, “out-sourced” resource

management

Reduced Time to deployment ◦ Ease of assembly, works “out of the box”

Scaling ◦ On demand provisioning, co-locate data and

compute

Reliability ◦ Massive, redundant, shared resources

Sustainability ◦ Hardware not owned

Page 26: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Infrastructure as a service (IaaS) ◦ Offering hardware related services. Could include storage services

(database or disk storage) or virtual servers. ◦ Amazon EC2, Amazon S3, Rackspace Cloud Servers and Flexiscale.

Platform as a Service (PaaS) ◦ Development platform. ◦ Google’s Application Engine, Microsofts Azure,

Salesforce.com’s force.com . Software as a service (SaaS)

◦ Software offering on the cloud. Users access a software application hosted by the cloud vendor on pay-per-use basis. Well-established.

◦ Salesforce.coms’ offering in the online Customer Relationship Management (CRM) space, Googles gmail and Microsofts hotmail, Google docs.

Page 27: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social
Page 28: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Storage-as-a-service Database-as-a-service Information-as-a-service Process-as-a-service Application-as-a-service Platform-as-a-service Integration-as-a-service Security-as-a-service Management/

Governance-as-a-service Testing-as-a-service Infrastructure-as-a-service

InfoWorld Cloud Computing Deep Dive

Page 29: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Service-Oriented Architecture (SOA)

Utility Computing (on demand)

Virtualization (P2P Network)

SAAS (Software As A Service)

PAAS (Platform AS A Service)

IAAS (Infrastructure AS A Servie)

Web Services in Cloud

Page 30: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Hardware

Operating System

App App App

Traditional Stack

Hardware

OS

App App App

Hypervisor

OS OS

Virtualized Stack

Page 31: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Utility computing = Infrastructure as a Service (IaaS) ◦ Why buy machines when you can rent cycles?

◦ Examples: Amazon’s EC2, Rackspace

Platform as a Service (PaaS) ◦ Give me nice API and take care of the maintenance,

upgrades, …

◦ Example: Google App Engine

Software as a Service (SaaS) ◦ Just run it for me!

◦ Example: Gmail, Salesforce

Page 32: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Amazon Elastic Compute Cloud

Google App Engine

Microsoft Azure

GoGrid

AppNexus

Page 33: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

COBOL, Edsel

Amazon.com

Darkness Web as a

Platform

Web Services,

Resources Eliminated

Web

Awareness

Internet ARPANET

Dot-Com Bubble Web 2.0 Web Scale

Computing

Page 34: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Elastic Compute Cloud – EC2 (IaaS)

Simple Storage Service – S3 (IaaS)

Elastic Block Storage – EBS (IaaS)

SimpleDB (SDB) (PaaS)

Simple Queue Service – SQS (PaaS)

CloudFront (S3 based Content Delivery Network – PaaS)

Consistent AWS Web Services API

Page 35: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social
Page 36: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

AppEngine:

Higher-level functionality (e.g., automatic scaling)

More restrictive (e.g., respond to URL only)

Proprietary lock-in

EC2/S3:

Lower-level functionality

More flexible

Coarser billing model

June 3, 2008

Google AppEngine vs. Amazon EC2/S3

Slide

36

VMs Flat File Storage

Python BigTable Other API’s

Page 37: watch?v=-Gj93L2Qa6ccis4397.chibana500.com/slides/BigDataOverview.pdf · Relational Data( Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social

Slide deck evolved from an original by Professor Ruoming Jin at Kent State…

Further viewing: “Big Data Revolution” – PBS Documentary https://www.youtube.com/watch?v=bIY3LUZ7i8Y


Recommended