+ All Categories
Home > Documents > Big Data with MATLAB and Spark - MathWorks · Big Data with MATLAB and Spark Pierre Harouimi. 2 ......

Big Data with MATLAB and Spark - MathWorks · Big Data with MATLAB and Spark Pierre Harouimi. 2 ......

Date post: 20-May-2020
Category:
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
19
1 © 2015 The MathWorks, Inc. Big Data with MATLAB and Spark Pierre Harouimi
Transcript

1© 2015 The MathWorks, Inc.

Big Data

with MATLAB and Spark

Pierre Harouimi

2

▪ Too much data to handle and

capture it

▪ Difficult to predict

▪ Real-Time dependence

Real-World Example: Sports Analytics

3

Visualization

Preprocessing

Machine Learning

Big data workflow: from desktop to production

ACCESS DATA

PROCESS ON DESKTOP

SCALE PROBLEM SIZE

4

▪ Standard tools won’t work

▪ Time-consuming

▪ Need to learn new

tools & rewrite algorithms

So, what’s the big (data) challenges?

5

Prototype algorithms quickly

Run directly from MATLAB

with tall arrays

Use the same MATLAB code

▪ Standard tools won’t work

▪ Time-consuming

▪ Need to learn new

tools & rewrite algorithms

Solution!

6

Datastore & tall arrays

Cluster of

Machines

Memory

Single

Machine

Memory

One or more files

datastore

1. Use datastore to define file-list

2. Create tall table from datastore

3. Act like ordinary table in parallel

4. Request on local machine

>> ds = datastore('*.csv')

>> tt = tall(ds)

>> model = fitlm(tt.Temp=...)

>> result = gather(tt.result)

tall array

Single

Machine

MemoryProcess

Single

Machine

MemoryProcess

7

Tall arrays: very small changes

1 file 1000+ files

8

Workflow Pattern

Access out of memory data

Work with subsets of your data

Develop functions for event

detection and calculation

Apply functions to all of your data

Aggregate, summarize, & visualize

datastore & tall

findgroups, splitapply

Normal MATLAB code

cellfun

table, histogram, heatmap,

boxplot, binScatterPlot

9

MATLAB Distributed Computing Server (MDCS)

Local Parallel

Computing

Deployed Parallel Computing

10

What is Hadoop/Spark?

Cluster-computing framework

Data parallelism

Fault toleranceMachine Learning

11

Scaling with Spark: Very small changes too!

Desktop Code Spark + Hadoop Code

12

Big Data with MATLAB & Spark

datastore

Data that don’t fit in memoryACCESS DATA

Enable experts domainsPROCESS ON DESKTOP

tall

Use the SAME MATLAB CodeSCALE PROBLEM SIZE

MDCS

Tall arrays

Subset of your data

Local parallel computing

13

The MathWorks Fleet Data

Data collected over 1.5 years

21 unique vehicles

1300 trips log files

39 unique channels

CO2

Challenges

Big DataTesting Ideas

Event Detection

Needle in the Haystack

Objectives

14

Example Setup at MathWorks

Data Warehouse

Server

Engineers

4G LTE

Bluetooth

15

Analyze fleet data with MATLAB

16

Access & Explore Data: MATLAB & Spark

MathWorks Vehicle Fleet

Challenge Develop and deploy Data Analytics to run on Spark against

vehicle fleet data stored on Hadoop

Solution Use MATLAB tall arrays to develop analytics on the

desktop and then scale out to the Spark cluster

Results Developed insight and understanding of over 1300 vehicle trips

Fuel efficiency performance under real-world driving conditions

17

Image Processing

• Active Safety

Statistics

• Summary Statistics

• Regression, ANOVA, Machine Learning

Signal Processing

• Sound quality analysis

• LIDAR analysis

Analysis Domains

MPG Acceleration Displacement Weight Horsepow er

MP

GA

ccele

ratio

nD

ispla

cem

ent

Weig

ht

Hors

epow

er

50 1001502002000 4000200 40010 2020 40

50

100

150

200

2000

4000

200

400

10

20

20

40

Location/Mapping

• Analyzing GPS Data

• Custom Visualizations

18

Key Takeaways

▪ Use the same MATLAB code

▪ Use new MATLAB data types datastore & tall arrays for out of

memory data sets

▪ Scale your work up with Parallel Computing Toolbox on the desktop or

the MATLAB Distributed Computing Server (MDCS) on Spark

19© 2018 The MathWorks, Inc.

© 2018 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See www.mathworks.com/trademarks for

a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders.


Recommended