D3M Trends Analytics - VISHAL SINGH · D3M- NYU STERN Vishal Singh Trends in. Trends in Analytics...

transcript

ANALYTICS & TECHNOLOGY

D3M- NYU STERN

Vishal Singh

Trends in

Trends in Analytics1. Experimentation (& the use of behavioral economics/Psychology)

2. The Deep Learning Revolution1. Examples in Practice

2. Intuition for how it works (Ludwig by Uber)

3. Examples of specialized use cases

4. Data as fuel to Ai: strategic consequences

5. Labelled Data & Biases in Ai

3. Privacy & Accountability 1. Privacy

2. Explainable Machine Learning

4. Learning Resources

Randomized Controlled ExperimentsKey research tool

o Nobel prize in Economics 2019 (Banerjee, Duflo, Kremer)o for bringing experimental approach to alleviating global poverty

o Example: Uber Price Experimentso Going from Regular to a 1.2x surge 27% drop in demand (What is the price

elasticity?)

o Going from 1.9x to 2.0x surge, one would observe a six times larger drop in demand than in going from 1.8x to 1.9x surge. WHY?

o When the surge multiplier moved from 2.0x to 2.1x, people actually took more rides. WHY?

o Why is Uber so consistent in expected wait time for a ride regardless of demand conditions? Surge algorithm filters demand and encourages supply.

Moral Machine Experiment

o Classical Moral Puzzleo The Trolley Problem

o Harvard UG Philosophy class

o Moral Machine Experiment (Driverless car)o MIT, published in Nature

o Global Subjects, large number of conditions

o Video from Nature Journal

o Data Download

Do it if you can

All digital platforms provide automatic design for conducting/evaluating experimental outcomes (and optimize on best cells)

Analytics is super simple (only intuition needed is randomization). Use Insights from Behavioral economics/ Psychology

New Ideas: (1) Counterfactuals (what-if analysis), (2) Heterogeneous Treatment effects (Uplift Models)

Experiments

Current status of Deep Learning / Ai algorithmsInput huge amounts of data from a specific domain (e.g., images of Breast cancer, data on loan repayment histories) Output make decision in a specific case (Cancerous or Not?, Give a loan or Not?)

Algorithms Optimized on Specified goals (Minimize risk of mis-classification of Cancer, Maximize time spent on FB / Youtube).

Example 1: Speech Recognition/Translation

Microsoft Inspire 2019

Example 2: Image Recognition, AR, Levering Databases

Google I/O 2019

How do these deep learning algorithms work?Get intuition from Ludwig, tool recently released by Uber

Applications in Health DomainRegina Barzilay, MIT

(visit her page and read popular press articles if interested in topic)

4. Data as fuel for Ai: strategic consequences

The era of Open Source

Note: Most algorithms in this domain are Open Source (e.g. ONNX). What matters then is implementation—key ingredient there is DATA.

The Rise of Open Source Software, CNBC (Dec 11, 2019)

Pattern to observe: 1) We need large amounts of “labeled” data 2) Algorithms are open source

Who has the most data now? Who will control data in the future? Strategic consequences

How do we label data? Pictures, reviews, etc. are human coded. Does it generate Bias?

“In the age of Ai where data is the new fuel, China is the new Saudi Arabia” Kai-Fu Lee (Author, AI Superpowers)

Note: Most algorithms/research in this domain is Open Source (e.g. ONNX). What matters then is implementation—key ingredient there is DATA.

What jobs would be lost? US, China & Ai (Bloomberg 5 min)

We don’t need just data, we need “labelled” dataWhere do we get labelled data?

Source 1:

We are generating it every day. Every time we click, like, browse, watch, buy…

What can simple FB likes reveal about us?

Source 2:

Human Coding: Recruit people (for example on Amazon Mturk) to annotate images, classify tweets or reviews as positive of negative, etc.

Both can lead to significant BIASES in Ai algorithms

Next few slides are drawn from here

Classification Task

Human Coder

Many sources of BiasesWhere are human annotators coming from?

Biggest breakthrough in Ai algorithm over the past several decade. Requires large amount of labeled data, domain specific

The 4 V’s of Big DataVolume & Velocity: More or less handled through AWS/GCP/Azure

Variety (numeric, categorical, unstructured (text, image, videos)

Veracity: how accurate/truthful/representative your input data is

Deep Learning: Summary

Privacy & Accountability

2 Questions: 1) Who do you trust with your data?

o Between firms (e.g. FB vs. Google)o (Chinese or Indian or US) Government vs Private

enterprise? 2) Would you install a small chip for early detection of health problems? Next revolution is integration of bio-tech & Ai

Highly recommended books..

Yuval Harari: Will Technology Help Us Become Immortal?

Privacy & Accountability:

Interpretable Machine Learning

Growing emphasis on “Interpretable” machine learning

Products by tech giants & startups (Google, Microsoft, H2OAi)

Current examples of interpretability: 1) Variable importance2) Predicted values 3) Continued & Increasing use of Visual Analytics

Need for a product like JMP Profiler with better UI

Final Exam

Part 1: Data ExplorationBasic Data Literacy: Everyone needs this regardless of your field of interest or industry/sector

Part 2: Regression/PredictionMore important if you plan on more analytical type career. Knowing every detail is not important, keep higher level intuitions.

Learning Resources

Data & Educational Resources

o Data: Library for good Business data, Kaggle for Prediction type data, Use Google Data Search for everything else

o Tableau/JMP: Online video library by Tableau, Download JMP books from the Help menu

o Analytics: is a (foreign) Language. Knowing basics and using the right software(s) will take you far “Use-it” or “loose-it” principle applies

o Key is learning-by-doing

D3M Trends Analytics - VISHAL SINGH · D3M- NYU STERN Vishal Singh Trends in. Trends in Analytics...

Documents