+ All Categories
Home > Technology > Kaggle The Home of Data Science

Kaggle The Home of Data Science

Date post: 12-Aug-2015
Category:
Upload: odsc
View: 240 times
Download: 3 times
Share this document with a friend
Popular Tags:
25
KAGGLE THE HOME OF DATA SCIENCE Anthony Goldbloom O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci
Transcript
Page 1: Kaggle The Home of Data Science

KAGGLE THE HOME OF DATA SCIENCE

Anthony Goldbloom

O P E ND A T AS C I E N C EC O N F E R E N C E_

BOSTON 2015

@opendatasci

Page 2: Kaggle The Home of Data Science

Kaggle

The home of data science

Page 3: Kaggle The Home of Data Science

GE Flight Quest 2Optimize flight routes basedon weather & traffic

$250,000122 teams

Hewlett Foundation: Automated Essay ScoringDevelop an automated scoring algorithmfor student-written essays

$100,000155 teams

Allstate Purchase Prediction ChallengeDevelop an automated scoring algorithmfor student-written essays

$50,0001,570 teams

Merck Molecular Activity ChallengeHelp develop safe and effective medicinesby predicting molecular activity

$40,000236 teams

Higgs Boson Machine Learning ChallengeUse the ATLAS experiment toidentify the Higgs boson

$13,0001,302 teams

Page 4: Kaggle The Home of Data Science

Age Income Default

58 $95,824 True

73 $20,708 False

59 $82,152 False

66 $25,334 True

Age Income Default

73 $53,445

61 $36,679

47 $90,422

44 $79,040

Training Data Test Data

The Kaggle Approach

Page 5: Kaggle The Home of Data Science
Page 6: Kaggle The Home of Data Science

Mapping Dark Matter

Competition Progress

Accuracy(lower is better)

Week 1 Week 3 Week 5 Week 7 End

.0150

.0170

Martin O’LearyPhD student in Glaciology, Cambridge U

Page 7: Kaggle The Home of Data Science

“In less than a week, Martin O’Leary, a PhD student in glaciology, outperformed the state-of-the-art algorithms”

“The world’s brightest physicists have been working for decades on solving one of the great unifying problems of our universe”

Page 8: Kaggle The Home of Data Science

Mapping Dark Matter

Competition Progress

Accuracy(lower is better)

Week 1 Week 3 Week 5 Week 7 End

.0150

.0170

Martin O’LearyPhD student in Glaciology, Cambridge U

Marius CobzarencoGrad student in computer vision, UC London

Ali Haissaine & Eu Jin LocSignature Verification, Qatar U & Grad Student @ Deloitte

Other

deepZot (David Kirkby & Daniel Margala)Particle Physicist & Cosmologist

Page 9: Kaggle The Home of Data Science

EXAMPLE ESSAY QUESTION —

We all understand the benefits of laughter. For example, someone once said, “Laughter is the shortest distance between two people.”

Many other people believe that laughter is an important part of any relationship. Tell a true story in which laughter was one element or part.

We can work with difficult data —

Page 10: Kaggle The Home of Data Science

The winning model correctly predicted seizures 82% of the time. Until that point, researchers had struggled to develop an algorithm that did better than chance

Mayo Clinic:Seizure detection from EEG readings

Page 11: Kaggle The Home of Data Science

We’ve worked with many of the world’s largest companies

Healthcare & Pharma

Consumer Internet

Finance IndustrialConsumerMarketing

Oil& Gas

$50b+Beverage

Co.

Global Bank

Top CreditCard

Issuer

Top 5 E&P

Top 20 E&P

Page 12: Kaggle The Home of Data Science

Community of over 320K data scientists

Page 13: Kaggle The Home of Data Science

That submit over 100K machine learning models per month

May-10 May-11 May-12 May-13 May-14 May-150

20,000

40,000

60,000

80,000

100,000

120,000

140,000

160,000

Monthly Submissions to Kaggle Competitions

Page 14: Kaggle The Home of Data Science

Feature engineering matters most

Page 15: Kaggle The Home of Data Science

Good software engineering practices and robust statistical methods are key

Page 16: Kaggle The Home of Data Science

80% of data science is grunt work and only 20% involves deep thinking

Page 17: Kaggle The Home of Data Science

A good pipeline makes data scientists more productive and their work higher quality and more enjoyable

Page 18: Kaggle The Home of Data Science
Page 19: Kaggle The Home of Data Science
Page 20: Kaggle The Home of Data Science
Page 21: Kaggle The Home of Data Science
Page 22: Kaggle The Home of Data Science
Page 23: Kaggle The Home of Data Science
Page 24: Kaggle The Home of Data Science

Our workflow environment will be the central repository for all data science work in a company

Page 25: Kaggle The Home of Data Science

Anthony [email protected] 283 9781


Recommended