+ All Categories
Home > Technology > Big Data Analysis Starts with R

Big Data Analysis Starts with R

Date post: 10-May-2015
Category:
Upload: revolution-analytics
View: 5,471 times
Download: 3 times
Share this document with a friend
Popular Tags:
20
R evolution Analytics December 20, 2011 1 The Big Data Analytics Revolution Starts with R
Transcript
Page 1: Big Data Analysis Starts with R

R evolution A nalytic s

Dec ember 20, 2011

1

T he B ig Data A nalytic s R evolutionS tarts with R

Page 2: Big Data Analysis Starts with R

In Today’s Webinar:

About Revolution AnalyticsGetting Value with Advanced AnalyticsImplementing The Advanced Analytics StackResources and Further Reading

Page 3: Big Data Analysis Starts with R

The professor who invented analytic software for the experts now wants to take it to the masses

Most advanced statistical analysis software available

Half the cost of commercial alternatives

2M+ Users

4,000+ Applications

Statistics

Predictive Analytics

Data Mining

Visualization

Finance

Life Sciences

Manufacturing

Retail

Telecom

Social Media

Government

Power

Productivity

Enterprise Readiness

Page 4: Big Data Analysis Starts with R

An open-source software project

A community

Data analysis software

A programming language

An environment

What is R ?

4

Page 5: Big Data Analysis Starts with R

What’s the Difference B etween R and R evolution R E nterpris e?

Revolution R is 100% R and More®

5

R Engine Language Libraries

4,000+ Community Packages

Technical Support

Web-Based GUI

Web ServicesAPI

Big DataAnalysis

IDE / DeveloperGUI

BuildAssurance

ParallelTools

Multi-ThreadedMath Libraries

Page 6: Big Data Analysis Starts with R

L et’s Talk about B ig Data

6

Page 7: Big Data Analysis Starts with R

E xtracting Value with A dvanced A nalytics

Missing the potential value of the data that is being collectedNeed more than counts and averagesAdvanced Analytics with Big Data

Predict the FutureUnderstand Risk and UncertaintyEmbrace ComplexityIdentify the UnusualThink Big

7

Page 8: Big Data Analysis Starts with R

R : A Unique P latform for E xtrac ting Value from Data

• R is superior at exploring data to find unexpected trends and relationships…finding the best predictive models and identify critical “outliers”, such as clusters of customers who are particularly profitable(or unprofitable!).

Data Exploration and Visualization

• Google, LinkedIn and Facebook, rely on R and the skills of data scientists who are accustomed to hacking together large data sets from disparate sources, visualizing and exploring data to identify novel modeling techniques, and combining the results of several modeling strategies to optimize predictive power.

Data Science

•Other commercial programs push users through a pre-programmed procedure and discourages modeling innovation. R was created as a 4GL with the needs of modern data scientists in mind, with an interactive language that promotes data exploration, data visualization, and flexible data modeling.

Modeling Innovation

•R is creating a massive amount of talent because is now the dominant tool of choice at the universities.Talent

8

Page 9: Big Data Analysis Starts with R

Making It WorkUs e C as es for B ig Data A nalytic s deployment

9

Page 10: Big Data Analysis Starts with R

T he A dvanced A nalytics S tack

Deployment / Consumption

Advanced Analytics

ETL

Data / Infrastructure

“Open Analytics Stack” White Paper: bit.ly/lC43Kw10

Page 11: Big Data Analysis Starts with R

B es t P rac tic es for Implementing an A dvanc ed A nalytic s S tac k for B ig Data

Limit samplingReduce data movement and replicationBring the analytics as close as possible to the dataOptimize computation speed – parallel algorithms

11

Page 12: Big Data Analysis Starts with R

B ig Data C omputations

Computations are data intensiveTo be effective, must rely on data parallelism

Data is distributed across compute nodesSame task is run in parallel on each of the data partitions

Examples of distributed computing frameworks that support data parallelism

Traditional file based analytics using on-premise clustersHadoop and MapReduceIn-Database Analytics using parallel hardware architectures

12

Page 13: Big Data Analysis Starts with R

R evolution R E nterpris e: B ig Data S tatis tics in R

13

www.revolutionanalytics.com/bigdata

Every US airline departure and arrival, 1987-2008

File: AirlineData87to08.xdfRows: 123.5 millionVariables: 29Size on disk: 13.2Gb

arrDelayLm2 <- rxLinMod(ArrDelay ~ DayOfWeek:F(CRSDepTime),cube=TRUE)

Page 14: Big Data Analysis Starts with R

Compute Node

(RevoScaleR)

Compute Node

(RevoScaleR) Master Node

(RevoScaleR)

DataPartition

DataPartition

Compute Node

(RevoScaleR)

Compute Node

(RevoScaleR)

DataPartition

DataPartition

• Portions of the data source are made available to each compute node

• RevoScaleR on the master node assigns a task to each compute node

• Each compute node independently processes its data, and returns its intermediate results back to the master node

• master node aggregates all of the intermediate results from each compute node and produces the final result

R evoS c aleR – Dis tributed C omputing

14

Page 15: Big Data Analysis Starts with R

R and Hadoop

15

R Client

R

Map or Reduce

Job Tracker

Task Node

HDFS

HBASE

Thrift

rhdfs - R and HDFSrhbase - R and HBASErmr - R and MapReduce

Capabilities delivered as individual R packages

rmr

rhdfs rhbase

Downloads available from Github

Page 16: Big Data Analysis Starts with R

R evolution A nalytics with Netezza A ppliance

16

Page 17: Big Data Analysis Starts with R

Deployment with R evolution R E nterpris e

17

RevoDeployR Web Services

Client libraries (JavaScript, Java, .NET)

Desktop Applications(i.e. Excel)

Business Intelligence

(i.e. QlikView)

Interactive Web Applications

HTTP/HTTPS – JSON/XML

Session Management Authentication Data/Script

Management Administration

RR Programmer

ApplicationDeveloper

End User

RR

Admin

Page 18: Big Data Analysis Starts with R

T hree final thoughts

Now enterprise-ready, R offers innovation and flexibility needed to meet analytics challenges in a changing worldR-enabled advanced analytics are key to unlocking value in big dataRevolution Analytics optimizes R to take advantage of multiple data management paradigms and emerging best practices

18

Page 19: Big Data Analysis Starts with R

R es ourcesSlides / Replay: bit.ly/r-big-data

“Open Analytics Stack” White Paper: bit.ly/lC43Kw

McKinsey Report on Big Data: bit.ly/jWyrFM

Conway, Data Science Intelligence: bit.ly/myMwak

“Big Analytics” White Paper by Norman H. Nie: bit.ly/biganalytics

Revolution R Enterprise: bit.ly/Enterprise-R

Questions: [email protected]

19

Page 20: Big Data Analysis Starts with R

20

www.revolutionanalytics.com 650.330.0553 Twitter: @RevolutionR

The leading commercial provider of software and support for the popular open source R statistics language.

T hank you.


Recommended