Date post: | 06-Jan-2017 |
Category: |
Data & Analytics |
Upload: | jo-fai-chow |
View: | 151 times |
Download: | 2 times |
Kag g le Competitions, New Fr iends , New Sk i l l s and New Opportunities
Jo-fai (Joe) ChowData [email protected]
@matlabulus
Version 2 – Data Science Exeter Meetup
2
Civi l Engineer → Data Scientist
• 2005 - 2015
• Water Engineero Consultant for Utilities
• SEAMS (Sheffield)o EngD Research
• University of Exeter• XP Solutions (Newbury)
• 2015 - Present
• Data Scientisto UK Telecom
• Virgin Mediao Silicon Valley
• Domino Data Lab• H2O.ai
4
About This Talk
• What happenedo Things I did since I
started participating in Kaggle competitions.
o New opportunities – results of new skills and friends.
5
First MOOC Experience
• One of the first Massive Open Online Courses.o Met some new friends.o Decided to collaborate
for fun.o “How about Kaggle?”o “What is Kaggle?”
6
About Kaggle
• World’s biggest predictive modelling competition platform
• 560k members• Competition types:
o Featured (prize)o Recruitmento Playground o 101
7
First Kaggle Experience
• First time in my lifeo Supervised learning
• Random Forest• Support Vector Machine• Neural Networks
o Train, Validate & Predict.o “Is it black magic?”
8
First Kaggle Experience
• Problemso “Hey Joe, you are a nice
guy but we can’t work together.”
o “You love MATLAB so much. You even call yourself @matlabulous on twitter!”
o “We prefer R/Python.”
• Resultso I kept using MATLABo Lone wolfo No collaboration
9
Identifying Ski l ls Gap
• Obvious skills gap:o Open-source
programming langaugeso Machine learning
techniqueso Collaboration
• Kind of relatedo Data visualisationo Handling large datasetso Explaining results
• That competition was a good wake up call.
10
From MATLAB to R/Python
MATLAB Python R
Neural Networks ✔️ ✔️ ✔️
Random Forest ✔️ ✔️ ✔️
SVM ✔️ ✔️ ✔️
Other Machine Learning Libraries
Toolboxes (commercial + open
source)
Scikit-learn and many more
CRAN, GitHub(A LOT!)
Data Visualisation I wasn’t good at it anyway …
Matplotlib(plus a lot more
since then)
ggplot2 (WOW!)(plus a lot more
since then)
11
What can people do with R?
James Cheshire, UCLLink
Paul Butler, FacebookLink
12
Fi l l ing the Ski l ls Gap
• More MOOCo Machine Learning
• Andrew Ng (Coursera)o Data Analysis
• Jeff Leek (Coursera)• R
o Intro to Programming• Dave Evans (Udacity)• Python
• Things I also picked up:o Linux (Ubuntu)o Git o Cloud computingo HTML / CSS
13
Learning from other Kagglers
• Continuous learningo Kaggle’s forums and blogs.o New tools and tricks.o Many things you cannot
learn from school.o I am standing on the
shoulders of many Kagglers.
14
Side Project 1 – Crime Data Viz
shiny::runGitHub("rApps", "woobe", subdir = "crimemap")
http://insidebigdata.com/2013/11/30/visualization-week-crimemap/
Before I knew it …Using R + crime data from data.gov.uk
15
Side Project 2 – Data Viz Contest
https://github.com/woobe/rugsmaps
While I was obsessed with making maps …
http://blog.revolutionanalytics.com/2014/08/winner-for-revolution-analytics-user-group-map-contest.html
16
Side Project 3 – Colour PaletteI am also obsessed with colours …
https://github.com/woobe/rPlotter
http://blog.revolutionanalytics.com/2015/03/color-extraction-with-r.html
#TheDress
17
Side Project 4 – World Cup 2014
• World Cup 2014 Correct Score Predictiono ML vs. my friendso 10 out of 64 (15.6%)o Friends’ avg. = 4 (6.3%)o github.com/
woobe/wc2014
• Euro 2016o Collecting data right nowo github.com/woobe/
euro2016
18
Open Up Myself
• Before Kaggle/MOOCo I was drawing a circle
around myself.o Fear of change.o Domain-specific problem
solving.
• After Kaggle/MOOCo Data-driven approach.o Not a subject matter
expert? No worries o Free to try new tools, to
learn and to create.
19
New Opportunities
• LondonRo First presentation
outside water industry / academia.
o Very positive feedback.o Led to other projects.o bit.ly
/londonr_crimemap
20
New Opportunities
• useR! 2014 (UCLA)o Presented a poster.o Met new friends.o Life-changing event.o github.com/
woobe/useR_2014
21
New Friends
Ramnath Vaidyanathanhtmlwidgets
DataRobot
Nick @ DominoDataLab
H2O.ai &John Chambers!
rOpenSciRStudio
Matt Dowledata.table (also at H2O.ai)
22
More Opportunities
• First blog post about H2Oo Things to try after useR!
– Part1: Deep Learning with H2O
23
More Opportunities
• Blog post about Domino and H2Oo I did it for fun. I did not
have any expectation.o It helped attract
customers to both Domino and H2O.
25
London Kagglers Assemble
• London Kaggle Meetupo Sep 2015o I met my Kaggle buddy
Mickael Le Galo He is a product data
scientist at Tictrac
Mickael Joe
26
London Kagglers Assemble
• Rossmann Store Saleso We got stuck at top 10% for a long period.o Mickael had a breakthrough in feature
engineering with 48 hours to go.o I re-trained all models and completed
model stacking just a few hours before the deadline (thanks to Domino Data Lab).
o Top 2% finish (our best result so far).
28
More Opportunities
• bit.ly/joe_h2o_talk1• bit.ly/joe_h2o_talk2• bit.ly/joe_h2o_talk3• bit.ly/joe_h2o_talk4• …
LondonR
PyData Amsterdam
London Kaggle
29
Summary of Benefits
• Directo Identify data science
skills gap.o Learn quickly from the
community.o Expand your network.o Prepare yourself for real-
life data challenges.
• Indirecto You also learn non-ML
skills along the way.o You learn to build small
data products (e.g. graph, web app, REST API) and help others gain insight.
30
Big Thank You!
• University of Exetero Prof. Dragan Savic
• Mango Solutions• RStudio• Domino Data Lab• H2O.ai• London Kaggle Meetup
Organisers
1st LondonR TalkCrime Map Shiny Appbit.ly/londonr_crimemap
2nd LondonR TalkDomino API Endpointbit.ly/1cYbZbF
31
Any Questions?
• Contacto [email protected] @matlabulouso github.com/woobe
• Links (All Slides)o github.com/h2oai/h2o-
meetups
• H2O in Londono Coming soon!
• Meetups• Office
o We’re hiring!o www.h2o.ai/careers