+ All Categories
Home > Technology > Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Shiva Amiri, Chief Product Officer, RTDS Inc. at MLconf SEA - 5/01/15

Date post: 15-Jul-2015
Category:
Upload: sessionsevents
View: 147 times
Download: 1 times
Share this document with a friend
Popular Tags:
27
Shiva Amiri, PhD Chief Product Officer MLConf Seattle - May 1 st 2015 Incorporating the Real Time Component into Analytics and Machine Learning
Transcript

Shiva Amiri, PhD

Chief Product Officer

MLConf Seattle - May 1st 2015

Incorporating the Real Time Component into Analytics and Machine Learning

The Challenge

One or more structural limitations have significantly constrained successful data mining applications and initiatives

Frequently, these problems are associated with the amount of data, the rate of data generation and the number of attributes (variables) to be processed –

1000’s of data variables form which to model from (dimensionality) 100’s of billions of records to model data Continuously evolving data elements and changing sets of data The need to execute and adapt in Real Time

Increasingly, this “big data” environment expands beyond the capabilities of conventional data mining methods and technology

2

Source: http://www.informationweek.com/big-data/big-data-analytics/5-analytics-bi-data-management-trends-for-2015/a/d-id/1318551 -09/01/2015

What are the trends?

4

The Market Opportunity

IDC Reports Big Data Analytics market at $125 billion in 2015

Gartner reports the Internet of Things (IoT) will have 25 billion devices with

sensors connected by 2020 producing exabytes of data

IoT/E Market size by 2020 will exceed $14 trillion

Bioinformatics market is $7.5 billion according to Gartner

Streaming data, Real Time analytics and machine learning remain a

significant challenge for multiple sectors

Which verticals are we looking at?

Bioinformatics, Computational Biology – genetics, proteomics, EEG data, fMRI, Molecular Dynamics data, etc.

Financials – behaviour, signals, patterns

Internet of Everything

Other fast and massive data is what we are interested in

5

Disorder X

An example: Complexity of Brain Disorders

Disorder Y

7

What kinds of questions do we want to ask? How do the genes and proteins in disorders relate

to each other – clustering, regression,

classification, etc.

What are the other factors involved in disease

onset and progression?

What about environment data? Quality of Life?

Education? Socioeconomic status? - natural

language processing (NLP), classification,

predictive modeling, etc.

How can we handle massive amounts of brain

sensing and imaging data (EEG, fMRI) and link

them to other data (genes and proteins)?

Integrative analytics

And questions we don’t know we have

Big Data: The Four V’s

RTDS’ SymetryMLTM : What have we built?

SymetryML™ is a distributed GPU-implemented predictive analysis and modeling technology for our Massive Data universe…

V3.5 released – real time analytics of large-scale data

Exploration(statistics) and model building, assessment and prediction in real time

Robust security and privacy features

V4.0 being developed – distributed computing capability

9

How is SymetryML™ addressing these challenges?

The V’s of Big Data SymetryMLTM can handle heavy volumes of data (Volume)

SymetryMLTM can handle streaming data (Velocity)

Accelerated hardware with GPUs and distributed computing

REST API – flexibility and modular design, seamless integration into existing systems or development of custom systems

Simplicity of the design

Real Time analytics – exploration and model generation/prediction, handling massive data with unprecedented speed in real time

Privacy and security

Service Oriented Architecture – XaaS

11

Faster: In minutes SymetryMLTM can utilize 10,000’s+ variables by constructing 1000’s of model

combinations and ultimately reduce variables to a single model - builds models in real time as

it learns

Smarter with Scale: Linearly scalable with zero limitation in length of data sets and depth of

categorical data allows for unlimited learning from data

More Agile on-the-fly: Continuous learning, both distributed and parallel

Simply Deployed: SymetryMLTM models can be deployed in real time or in the form of scripts

(SQL, Java, etc.)

Proprietary Statistical Representation

Data

Learner Modeler

Predictor

Explorer

12

Parallel Processing/Distributed

Computing

Incremental/Decremental

Learning

(no rescan)

Automated Variable Selection

Add variables on-the-fly

SymetryML™

A few key features

Component Technologies

Component

Web UI

REST API

Core functionalities

NVIDIA GPU support

Project

sym-web

sym-rest

sym-core

sym-core

Language

JavaScript

Java

Java

C/C++

SymetryML™-COREBasic Functionality:

Learn / Forget data

Univariate Analysis – Mean, StDev, F Test, Z Test, T Test,

Bivariate Analysis

Correlation

Hypothesis Testing

Chi-square Testing

ANOVA

Model Selection and Creation

Predictions

Assessment

Persistence

Web-UI - exploration

15

Web-UI - exploration

16

Web-UI - modeling

17

Web-UI - assessment

18

RTDS Inc. – Headlines

Team of 6 engineers and Data Scientists in Toronto, Board in NY Focus on Technology Differentiation

Technology timeline March ’13 – Launched .NET Based Desktop Version

July ’13 – Launched SymetryMLTM Server with REST API.

December ’13 – Successfully deployed first GPU-based system

June ‘14 – Algorithmic Support Expanded

’15 Roadmap: Aggressive, Attainable and Defensible

Proven technology with successful deployment in advertising

Current Financing Mogility Capital

19

Next steps

We’ve been successful with this technology in the mobile advertising space…now we want to use the power of this technology in other strategic sectors

We are looking for partners as beta users - with unique datasets and use cases - what kinds of questions can we help answer with your data?

We are looking for integration partners where we can both enhance our offering

Develop the next version (v4.0) of SymetryMLTM – fully parallel with Apache Spark

20

22

SymetryMLTM and

GPUs

• Native library that uses NVIDIA GPUs are available for:

• Linux 64 bit (CentOS 5.x and Amazon Linux)

• Use of GPUs for core operations:• Learning / Forgetting data

• Model Building

• Model Selection

• Interactive HTML 5 application

• Direct connection to SYM-REST

• It is de-facto a light weight front-end to SYM-REST

• Based on Sencha Ext-JS 4.x

SymetryMLTM-WEB

• Provides a Restful API to sym-core.

• Supported Data Sources:

• Amazon S3

• SFTP

• HTTP/HTTPS

• Redshift

• Upcoming Data Sources:

• HDFS

• ODBC/JDBC

SYM-REST

• User of the rest-API needs an access key

• We generate these keys

• Key is AES 128 bits.

• Every REST request is authenticated with a HMAC

(SHA1) code based on part of the request

• If data encryption is needed, then usage of HTTPS

is possible

SYM-REST Security

Finance data example

• NASDAQ TotalView-ITCH Intraday Data Modeling

175Gb - one month of raw data

55Gb of transactions for NASDAQ100 constituents

12M rows/400 attributes

Univariate analysis across securities

Covariance and Hypothesis Testing

Model Building: Classification/Regression

Prediction of Price Movement

Full Order Book Analysis

27


Recommended