+ All Categories
Home > Data & Analytics > Kaggle Competitions, New Friends, New Skills and New Opportunities

Kaggle Competitions, New Friends, New Skills and New Opportunities

Date post: 06-Jan-2017
Category:
Upload: jo-fai-chow
View: 151 times
Download: 2 times
Share this document with a friend
31
Kaggle Competitions, New Friends, New Skills and New Opportunities Jo-fai (Joe) Chow Data Scientist [email protected] @matlabulus Version 2 – Data Science Exeter Meetup
Transcript

Kag g le Competitions, New Fr iends , New Sk i l l s and New Opportunities

Jo-fai (Joe) ChowData [email protected]

@matlabulus

Version 2 – Data Science Exeter Meetup

2

Civi l Engineer → Data Scientist

• 2005 - 2015

• Water Engineero Consultant for Utilities

• SEAMS (Sheffield)o EngD Research

• University of Exeter• XP Solutions (Newbury)

• 2015 - Present

• Data Scientisto UK Telecom

• Virgin Mediao Silicon Valley

• Domino Data Lab• H2O.ai

3

About Domino and H2O

4

About This Talk

• What happenedo Things I did since I

started participating in Kaggle competitions.

o New opportunities – results of new skills and friends.

5

First MOOC Experience

• One of the first Massive Open Online Courses.o Met some new friends.o Decided to collaborate

for fun.o “How about Kaggle?”o “What is Kaggle?”

6

About Kaggle

• World’s biggest predictive modelling competition platform

• 560k members• Competition types:

o Featured (prize)o Recruitmento Playground o 101

7

First Kaggle Experience

• First time in my lifeo Supervised learning

• Random Forest• Support Vector Machine• Neural Networks

o Train, Validate & Predict.o “Is it black magic?”

8

First Kaggle Experience

• Problemso “Hey Joe, you are a nice

guy but we can’t work together.”

o “You love MATLAB so much. You even call yourself @matlabulous on twitter!”

o “We prefer R/Python.”

• Resultso I kept using MATLABo Lone wolfo No collaboration

9

Identifying Ski l ls Gap

• Obvious skills gap:o Open-source

programming langaugeso Machine learning

techniqueso Collaboration

• Kind of relatedo Data visualisationo Handling large datasetso Explaining results

• That competition was a good wake up call.

10

From MATLAB to R/Python

MATLAB Python R

Neural Networks ✔️ ✔️ ✔️

Random Forest ✔️ ✔️ ✔️

SVM ✔️ ✔️ ✔️

Other Machine Learning Libraries

Toolboxes (commercial + open

source)

Scikit-learn and many more

CRAN, GitHub(A LOT!)

Data Visualisation I wasn’t good at it anyway …

Matplotlib(plus a lot more

since then)

ggplot2 (WOW!)(plus a lot more

since then)

12

Fi l l ing the Ski l ls Gap

• More MOOCo Machine Learning

• Andrew Ng (Coursera)o Data Analysis

• Jeff Leek (Coursera)• R

o Intro to Programming• Dave Evans (Udacity)• Python

• Things I also picked up:o Linux (Ubuntu)o Git o Cloud computingo HTML / CSS

13

Learning from other Kagglers

• Continuous learningo Kaggle’s forums and blogs.o New tools and tricks.o Many things you cannot

learn from school.o I am standing on the

shoulders of many Kagglers.

14

Side Project 1 – Crime Data Viz

shiny::runGitHub("rApps", "woobe", subdir = "crimemap")

http://insidebigdata.com/2013/11/30/visualization-week-crimemap/

Before I knew it …Using R + crime data from data.gov.uk

15

Side Project 2 – Data Viz Contest

https://github.com/woobe/rugsmaps

While I was obsessed with making maps …

http://blog.revolutionanalytics.com/2014/08/winner-for-revolution-analytics-user-group-map-contest.html

16

Side Project 3 – Colour PaletteI am also obsessed with colours …

https://github.com/woobe/rPlotter

http://blog.revolutionanalytics.com/2015/03/color-extraction-with-r.html

#TheDress

17

Side Project 4 – World Cup 2014

• World Cup 2014 Correct Score Predictiono ML vs. my friendso 10 out of 64 (15.6%)o Friends’ avg. = 4 (6.3%)o github.com/

woobe/wc2014

• Euro 2016o Collecting data right nowo github.com/woobe/

euro2016

18

Open Up Myself

• Before Kaggle/MOOCo I was drawing a circle

around myself.o Fear of change.o Domain-specific problem

solving.

• After Kaggle/MOOCo Data-driven approach.o Not a subject matter

expert? No worries o Free to try new tools, to

learn and to create.

19

New Opportunities

• LondonRo First presentation

outside water industry / academia.

o Very positive feedback.o Led to other projects.o bit.ly

/londonr_crimemap

20

New Opportunities

• useR! 2014 (UCLA)o Presented a poster.o Met new friends.o Life-changing event.o github.com/

woobe/useR_2014

21

New Friends

Ramnath Vaidyanathanhtmlwidgets

DataRobot

Nick @ DominoDataLab

H2O.ai &John Chambers!

rOpenSciRStudio

Matt Dowledata.table (also at H2O.ai)

22

More Opportunities

• First blog post about H2Oo Things to try after useR!

– Part1: Deep Learning with H2O

23

More Opportunities

• Blog post about Domino and H2Oo I did it for fun. I did not

have any expectation.o It helped attract

customers to both Domino and H2O.

24

Becoming a Data ScientistThe leap of faith …

25

London Kagglers Assemble

• London Kaggle Meetupo Sep 2015o I met my Kaggle buddy

Mickael Le Galo He is a product data

scientist at Tictrac

Mickael Joe

26

London Kagglers Assemble

• Rossmann Store Saleso We got stuck at top 10% for a long period.o Mickael had a breakthrough in feature

engineering with 48 hours to go.o I re-trained all models and completed

model stacking just a few hours before the deadline (thanks to Domino Data Lab).

o Top 2% finish (our best result so far).

27

Joining H2O.aiA call from Sri (CEO) just before Christmas 2015 …

28

More Opportunities

• bit.ly/joe_h2o_talk1• bit.ly/joe_h2o_talk2• bit.ly/joe_h2o_talk3• bit.ly/joe_h2o_talk4• …

LondonR

PyData Amsterdam

London Kaggle

29

Summary of Benefits

• Directo Identify data science

skills gap.o Learn quickly from the

community.o Expand your network.o Prepare yourself for real-

life data challenges.

• Indirecto You also learn non-ML

skills along the way.o You learn to build small

data products (e.g. graph, web app, REST API) and help others gain insight.

30

Big Thank You!

• University of Exetero Prof. Dragan Savic

• Mango Solutions• RStudio• Domino Data Lab• H2O.ai• London Kaggle Meetup

Organisers

1st LondonR TalkCrime Map Shiny Appbit.ly/londonr_crimemap

2nd LondonR TalkDomino API Endpointbit.ly/1cYbZbF

31

Any Questions?

• Contacto [email protected] @matlabulouso github.com/woobe

• Links (All Slides)o github.com/h2oai/h2o-

meetups

• H2O in Londono Coming soon!

• Meetups• Office

o We’re hiring!o www.h2o.ai/careers


Recommended