+ All Categories
Home > Documents > Getting started with data science and machine learning

Getting started with data science and machine learning

Date post: 02-Jan-2022
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
29
Getting started with data science and machine learning GradEx Workshop 2019-02-21 Mher Kazandjian
Transcript
Page 1: Getting started with data science and machine learning

Getting started with data science and machine learning

GradEx Workshop2019-02-21

Mher Kazandjian

Page 2: Getting started with data science and machine learning

About me

Page 3: Getting started with data science and machine learning

Not about me

https://tinyurl.com/y3lsmcnk

Page 4: Getting started with data science and machine learning

Data science and Machine learning

Page 5: Getting started with data science and machine learning

Learning objectives

- Pointers on data science and machine learning from a practitioner's perspective- How to get started: software and skills requirements- Data science and machine learning exercise (hands on)

Page 6: Getting started with data science and machine learning

Data science - applications

- Advertisement- Linguistics- Astronomy- Forensics- Intelligence / security- Weather forecasts- Financial/economic forecasts

Page 7: Getting started with data science and machine learning

What data science is not

- Building models- Data visualizations- Write custom programs to process data

Page 8: Getting started with data science and machine learning

What data science is

- Using data to create as much impact as possible to solve a certain problem- Give insight- Data products- Product recommendations

In Order to achieve this it might be necessary to use tools such as

- Building models- Data visualizations- Write custom programs to process data

Page 9: Getting started with data science and machine learning

Big Data

- Data started to grow dramatically after 2004- Unstructured datasets- Traditional techniques were too slow and inadequate to handle growth- New tools and paradigms emerged- Data science term was coined around the late 2000s- Machine learning and AI are among the main benefactors of big data

- e.g- In 2007 a deep neural network beat traditional model for the first time

Page 10: Getting started with data science and machine learning

Scales

Data volumes:- Few GB to PB or bigger?

Processors:- A multi-core machine -> 1000s of servers with 10,000 cores- Accelerators (GPUs, FPGAs, ASICs)

Memory:- Several GB to TB

Network:- Gbit+

Page 11: Getting started with data science and machine learning

New software technologies

- Map-reduce- Hadoop- Spark- No SQL databases

- Elasticsearch- Mongodb

SQL

Page 12: Getting started with data science and machine learning

New software technologies

- Cloud computing- Data science services- Pre-deployed software that users can interact with:

- process data- Run models- Find data

Cloud providers save you time when it comes to setting upconfiguring hardware and software

- Good for:- fast prototyping - Fast testing- Fast deployment

Page 13: Getting started with data science and machine learning

Machine learning

- When traditional (algorithmic) methods fail, use machine learning- Success of Machine learning techniques is enabled by the abundance of data

- E.g, instead of writing code such as:

is replaced by

Page 14: Getting started with data science and machine learning

Machine learning

- When traditional (algorithmic) methods fail, use machine learning- Success of Machine learning techniques is enabled by the abundance of data

- E.g, instead of writing code such as:

x

y

z

t

w

Car 95%

Tree 0.1%

Door 0.9%

Shool bus 4%

Length = 4.5m

Width = 2m

color = red

Mass = 1800 kg

Has glass =True

Page 15: Getting started with data science and machine learning

Machine learning

- Logistic regression- Support vector machines- Neural networks

- Convolutional- Recurrent neural networks- Long Short Term Memory networks (LSTM)- Generative adversarial networks (GAN)

Page 16: Getting started with data science and machine learning

Machine learning

Page 17: Getting started with data science and machine learning

Applications

- Object detection, image recognition and classification- License plate recognition- Object classification

- Speech recognition- Siri, Google assistant, Alexa, …

- Self driving cars- Tesla, Volvo, Uber, Apple (probably soon)- Already outperform humans on average (precision and safety)

- AI in every TV (soon)- Built in Home assistant- Upscale/super resolution

- Face recognition and feature matching- Humanless terminals for passport checks

- Games- Go, Chess, Starcraft 2

exercise

exercise

Page 18: Getting started with data science and machine learning

This looks quite steep

- No need to be an expert in data science to work in a good company- e.g good analytic skills- some courses (coursera material is more than enough- some (self) training (e.g or www.kaggle.com)- average coding skills- contribute to open-soruce projects (gives you good exposure)- write a blog

Page 19: Getting started with data science and machine learning

This looks quite steep

- Companies are just starting to integrate data science and AI into their products- lots of opportunities and more to come in the near future- Many problems that benefit our everyday life use simple models such as

logistic regression

- 90% of the effort goes into getting good data andthe rest is e.g just to use that data to produce some visualization or train a model to classify orpredict a trend/class

Page 20: Getting started with data science and machine learning

Typical workflow

- download/collect/store/generate data- transform/filter/enhance/re-sample/augment data- label/classify/sort data- train/model data- Use models to understand the data / solve a problem / extract solution / reduce

dimensionality

Page 21: Getting started with data science and machine learning

Typical workflow

Page 22: Getting started with data science and machine learning

Top languages / packages

- Python- Numpy, scipy, pandas, Keras, tensorflow, pytorch

- R- Java- Scala- Matlab

Page 23: Getting started with data science and machine learning

DS and ML at AUB

- GPUs for deep learning- On campus:

- Mid range problems (8x Nvidia K20m - available now - mid range problems)- High end cards (2xV100 fall 2019 - 64GB GPU ram problems)

- On demand spark cluster (experimental)- R / R studio (available / on demand)- Jupyter notebooks (available / on demand)- Data processing up to 1TB on disk and 0.5 TB in ram- Aggregated number of cores for research ~ 800

On the cloud:- Azure (via grant / dept / faculty funding)

Page 24: Getting started with data science and machine learning

Hands on demo

Page 25: Getting started with data science and machine learning

Demo 1: Data science

Official exam gradeshttps://colab.research.google.com/drive/1qiMUfiSPkR8oVvpyP0HgBnUJwFiS4wcY

Page 26: Getting started with data science and machine learning

Demo 2: Simple but useful neural network (not deep)

A classifierhttps://colab.research.google.com/drive/1-Ui0SPgYaYCvdWR6LlnsFtFbBYm9ffqQ

Page 27: Getting started with data science and machine learning

Dataset layout

A huge 3d numpy array

Shape = (3, 3, 4)

Page 28: Getting started with data science and machine learning

Demo 3: Super resolution out of the box

Adversarial neural network (kindof)

https://github.com/fperazzi/proSR

Page 29: Getting started with data science and machine learning

Thank you for your attention


Recommended