+ All Categories
Home > Education > Introduction to Machine Learning with Python and scikit-learn

Introduction to Machine Learning with Python and scikit-learn

Date post: 27-Jan-2015
Category:
Upload: matt-hagy
View: 134 times
Download: 7 times
Share this document with a friend
Description:
PyATL talk about machine learning. Provides both an intro to machine learning and how to do it with Python. Includes simple examples with code and results.
Popular Tags:
23
Introduction to Machine Learning with Python and scikit-learn Python Atlanta Nov. 14 th 2013 Matt Hagy [email protected]
Transcript
Page 1: Introduction to Machine Learning with Python and scikit-learn

Introduction to Machine Learning with Python and scikit-learn

Python AtlantaNov. 14th 2013

Matt [email protected]

Page 2: Introduction to Machine Learning with Python and scikit-learn

Slide #2 Intro to Machine Learning with Python [email protected]

Machine Learning (ML):• Finding patterns in data

• Modeling patterns

• Use models to make predictions

Page 3: Introduction to Machine Learning with Python and scikit-learn

ML can be easy*• You already have ML applications!

• You can start applying ML methods now with Python & scikit-learn

• Theoretical knowledge of ML not needed (initially)*

*Gaining more background, theory, and experience will help

Slide #3 Intro to Machine Learning with Python [email protected]

Page 4: Introduction to Machine Learning with Python and scikit-learn

Simple Example

Slide #4 Intro to Machine Learning with Python [email protected]

Page 5: Introduction to Machine Learning with Python and scikit-learn

Simple Model

Slide #5 Intro to Machine Learning with Python [email protected]

Page 6: Introduction to Machine Learning with Python and scikit-learn

Slide #6 Intro to Machine Learning with Python [email protected]

import numpy as npfrom sklearn.linear_model import LinearRegression

x,y = np.load('data.npz')x_test = np.linspace(0, 200)

model = LinearRegression()model.fit(x[::, np.newaxis], y)y_test = model.predict(x_test[::, np.newaxis])

Page 7: Introduction to Machine Learning with Python and scikit-learn

Slide #7 Intro to Machine Learning with Python [email protected]

Page 8: Introduction to Machine Learning with Python and scikit-learn

Variance/Bias Trade Off

Slide #8 Intro to Machine Learning with Python [email protected]

• Need models that can adapt to relationships in our data

• Highly adaptable models can over-fit and will not generalize

• Regularization – Common strategy to address variance/bias trade off

Page 9: Introduction to Machine Learning with Python and scikit-learn

Slide #9 Intro to Machine Learning with Python [email protected]

Page 10: Introduction to Machine Learning with Python and scikit-learn

Slide #10 Intro to Machine Learning with Python [email protected]

import numpy as npfrom sklearn.svm import SVRfrom sklearn.pipeline import Pipelinefrom sklearn.preprocessing import StandardScaler

x,y = np.load('data.npz')x_test = np.linspace(0, 200)

model = Pipeline([ ('standardize', StandardScaler()), ('svr', SVR(kernel='rbf', verbose=0, C=5e6, epsilon=20)) ])model.fit(x[::, np.newaxis], y)y_test = model.predict(x_test[::, np.newaxis])

regularizationterm

Page 11: Introduction to Machine Learning with Python and scikit-learn

Supervised Learning

Slide #11 Intro to Machine Learning with Python [email protected]

031342934

Input, X

1637931767

Output, Y Modeling relationship between inputs and outputs

Sam

ple

Page 12: Introduction to Machine Learning with Python and scikit-learn

Multiple Inputs

Slide #12 Intro to Machine Learning with Python [email protected]

Input, X

031342934

X1

231689123

X2

103127542

X3

470291321

Xn

1637931767

Output, Y

Sam

ple

Page 13: Introduction to Machine Learning with Python and scikit-learn

Example: Image Classification

Slide #13 Intro to Machine Learning with Python [email protected]

• Classify handwritten digits with ML models

• Each input is an entire image

• Output is digit in the image

Page 14: Introduction to Machine Learning with Python and scikit-learn

Slide #14 Intro to Machine Learning with Python [email protected]

9Input, X Output, Y

2

Page 15: Introduction to Machine Learning with Python and scikit-learn

Slide #15 Intro to Machine Learning with Python [email protected]

import numpy as npfrom sklearn.ensemble import RandomForestClassifier

with np.load(’train.npz') as data: pixels_train = data['pixels'] labels_train = data['labels’]with np.load(’test.npz') as data: pixels_test = data['pixels']

# flattenX_train = pixels_train.reshape(pixels_train.shape[0], -1)X_test = pixels_test.reshape(pixels_test.shape[0], -1)

model = RandomForestClassifier(n_estimators=50)model.fit(X_train, labels_train)labels_test = model.predict(X_test)

Trains on 50,000 images in roughly 20 seconds.96% accurate !!

Page 16: Introduction to Machine Learning with Python and scikit-learn

Kaggle Data Science Competition

• Given 6 million training questions labeled with tags

• Predict the tags for 2 million unlabeled test questions

www.users.globalnet.co.uk/~slocks/instructions.htmlstackoverflow.com/questions/895371/bubble-sort-homework

Predicting the tags of Stack Overflow questions with machine learning

Slide #16 Intro to Machine Learning with Python [email protected]

Page 17: Introduction to Machine Learning with Python and scikit-learn

Text Classification Overview

Raw Posts Vector Space Machine Learning Model

Feature Extraction & Selection

Model Selection & Training

Slide #17 Intro to Machine Learning with Python [email protected]

Page 18: Introduction to Machine Learning with Python and scikit-learn

Term Frequency Feature Extraction

“Why is processing a sorted array faster than processing an array this is not sorted?”

Characterize text by the frequency of specific words in each text entry

Example Title:

whyprocessing

sorted

array

faster

1 2 2 2 1

Term Frequencies

Ignore common words (i.e. stop words)

Slide #18 Intro to Machine Learning with Python [email protected]

Page 19: Introduction to Machine Learning with Python and scikit-learn

Frequency of key terms is anticipated to be correlated with the tags of the question

why

processing

sorted

array

faster

need

help

java

homework

Title 1 1 2 2 2 1 0 0 0 0

Title 2 0 0 0 0 0 1 1 1 1

Title 3 0 0 1 1 0 0 1 0 1

Slide #19 Intro to Machine Learning with Python [email protected]

Page 20: Introduction to Machine Learning with Python and scikit-learn

Example Model Coefficients

Slide #22 Intro to Machine Learning with Python [email protected]

Page 21: Introduction to Machine Learning with Python and scikit-learn
Page 22: Introduction to Machine Learning with Python and scikit-learn

ML can be easy*• You already have ML problems!

• You can start applying ML methods now with Python & scikit-learn

• Theoretical knowledge of ML not needed (initially)*

scikit-learn.org

github.com/scikit-learn

Slide #24 Intro to Machine Learning with Python [email protected]

Page 23: Introduction to Machine Learning with Python and scikit-learn

Check out: liveramp.com/careers

Helping companies use their marketing data to delight customers

Opportunities•Backend Engineers•Data Scientists•Full-Stack Engineers

Tools•Java•Hadoop (Map/Reduce)•Ruby

Build and work with large distributed systems that process massive data sets.

Slide #25 Intro to Machine Learning with Python [email protected]


Recommended