+ All Categories
Home > Documents > Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE...

Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE...

Date post: 23-Sep-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
15
Research Methods in a Big Data and Cognitive Era Dr. Alex Liu RMDS Pasadena, CA, USA www.ResearchMethods.org Updated October 8, 2015
Transcript
Page 1: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

Research Methods in a Big Data and Cognitive Era

Dr. Alex Liu

RMDS

Pasadena, CA, USA

www.ResearchMethods.org

Updated October 8, 2015

Page 2: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

Research Process

Formulate a

Question

Select an Appropriate

Research Design

Collect & Analyze

Data

Interpret

Findings

Publish

Findings

Review the Available

Literature

Page 3: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

RMS

Research Methods are about optimal RM4Es workflows

Data

Sources Data

Storage Data

Cleaning

Feature

Extraction

MODELS

Regression

Decision

Tree

Bayesian & Causality

Time Series

ALGORITHMS &

COMPUTING

MLE

ITERATIVE (MapReduce

& Spark)

R

SPSS

STATISTICS &

Visualization

RMSE

Confusion

Matrix

ROC Curve

Business Acumen

Subject

Knowledge

Communication

Data Evaluation Explanation Estimation Equation

Page 4: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

Older Gen Research

• Literature Review in Library now Google

• Data in Excel Sheets

• Proprietary Computing with a Nicely

Integrated Package – Stata, SPSS, Mathematica

4

Page 5: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

New Gen of Research

• Open Source Computing Languages • R, Python, Scala, Julia

• Open Source Tools for Processing &

Organizing Data and Analytics – Notebooks: Jupyter, Zeppelin

– Visualization: D3.js, ggplot

– IDE: R studio

– Data Prep: Open Refine

• Open Source Execution Environments – Spark, Hadoop

5

Page 6: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

We live in a moment of accelerated transformation

of total workflows

will be in the cloud by 2016

62% Devices

connected to the

internet by 2020

75B of the world’s data created in the last two

years

90%

Page 7: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

Big Data Era – Too Much Data to Use

Page 8: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

Too Many Analytical Steps Research Flows Difficult to Manage

Page 9: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

Too Much Resources to Coordinate

GridOperations

simulation data

discovery

ScienceReview

Data Grid

storageelement

replica locationservice

storageelement

storageelement

Data

Tra

nsp

ort S

tora

ge

Reso

urc

eM

gm

t

virtualdata

catalog

virtual dataindex

virtual

datacatalog

virtualdata

catalog

Computing Grid

workflowplanner

request plannerworkflowexecutor

(DAGman)

request executor(Condor-G,

GRAM)

requestpredictor

(Prophesy)

Grid Monitor

ProductionManager

Researcher

planning

discovery

co

mp

ositio

n

sim

ula

tio

n

an

aly

sis

sharing

raw d

ata

detector

derivatio

n

Page 10: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

And a lot more to care …

• Safeguard Research Assets

– Control access

– Timely tracking

– Knowledge management

• Regulatory compliance

– Book keeping

– Versioning

– Time recording

Page 11: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

Challenges for Research Methods

• Too much data to import

• Too much data cleaning to complete

• Too many analytical methods to select

• Too many algorithms to select

• Too many computing tools to select

• Too many IT systems to select

Page 12: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

Many new methods are coming

Research Support

Structures the data to answer that question

IT

Delivers a platform to enable creative discovery

Researchers

Explores what questions could be asked

Researchers

Determine what question to ask

Monthly research reports

Profitability analysis

Customer surveys

Brand sentiment

Product strategy

Maximum asset utilization

Big Data Method

Iterative Analysis

Traditional Method Structured Analysis

Page 13: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

Research Methods in a new era

• Great opportunities for utilizing big data, high speed computing power and huge selection of analytical tools (models and

algorithms)

• One researcher alone may not be able to solve all the problems faced

• Some intelligent assistance is needed to help every researcher

Page 14: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

Intelligently Managing RFs Helps

• Replication (Provenance)

• Knowledge re-use & sharing

• Readiness for auditing

• Readiness for automation

• Removes much of the mundane data management burden, freeing scientists to do science

Replicability is the foundation of scientific research.

RF management facilitates replicability.

Research Flow (RF)

Page 15: Research Methods...Decision Tree Bayesian & Causality Time Series ALGORITHMS & COMPUTING MLE ITERATIVE (MapReduce & Spark) R SPSS STATISTICS & Visualization RMSE Confusion Matrix ROC

Need AI to automate and augment

• AI to automate some research flows

• AI to augment all researchers


Recommended