Date post: | 15-Jan-2015 |
Category: |
Technology |
Upload: | adrian-cockcroft |
View: | 1,631 times |
Download: | 2 times |
LEARN TO BE A DATA SCIENTIST FOR $1
Hack Kid Conference - April 2014 by Adrian Cockcroft
Battery Ventures
A BIG new problem for a new generation
Now
A BIG new problem for a new generation
Now
A BIG new problem for a new generation
Your future job as a Data Scientist
WHAT DOES A DATA SCIENTIST DO?
The hive mind map shows popular twitter hashtags for the last 7 days and how they are connected
http://hivemindmap.com/?#
HIVE MIND MAPA mind-map of what’s happening on Twitter
Thanks to Mark Harwood for these slides and the Hive Mind Maphttp://www.infoq.com/presentations/elasticsearch-revealing-uncommonly-common
Connections
The thickness of a line between hashtags is based on the strength of connection
Tip:!Strength of connection is the number of tweets with both tags vs the number with only one - see “Jaccard similarity coefficient”
Top tweets
The most popular tweets for a tag are sorted based on the number of “retweets”
When?
The rise and fall of each hashtag’s popularity can be shown over time
Calendar summary
Tags that “peak” together are grouped into events on a calendar
Tip:!Peaks are detected using standard deviations. Only tags with a single peak are chosen as events
Tip:!Tags that rise and fall in popularity at the same time are detected using Pearson’s Correlation
What makes this possible?• Free software (Lucene, Java, Eclipse, Gephi, Tomcat, d3, Google analytics…)
• Free data (millions of users’ tweets from Twitter’s 1% sample feed)
• “Cloud” computing (rented server)
• Smarter web browsers (visualizations using HTML5’s SVG/Canvas)
• All the friendly folks on the internet (e.g. http://stackoverflow.com/questions/14799842)
• Some imagination…
Opportunities in Data Science• We are all generating volumes of data never seen before
• You can recycle the behaviors of billions of people into more intelligent systems
• customer purchases can be used for product recommendations
• user searches can be used for spelling corrections,
• Reader clicks can influence the trending news
• Spotify activity is used to make music recommendations)
• The tools have never been cheaper
• It has never been easier to find help in developing systems
…one more thing..
I’m writing these slides for you while on my annual snowboarding
trip to Canada. Data science pays well ;-)
Wish you were here…
HOW CAN A KID LEARN BIG DATA
FOR $1?
BIG DATA IN THE CLOUD WITH AMAZON EMRhttps://www.youtube.com/watch?v=S6Ja55n-o0M
LESS THAN $1After running two of the EMR examples, creating 6 computers in the cloud
to do the analysis for up to an hour each
MEASURING KIDSHow good are you at Math and Science, is it getting better or worse?
SCHOOL DATAhttps://www.data.gov/
http://eddataexpress.ed.gov/state-report.cfm/state/CA/
ACHIEVEMENT SCORESDownload results into Excel to analyze and draw graphs
DOWNLOADED DATANeeded some clean-up. Made sure grade was consistent (4, 8, HS) for all
results, and created a short Subject column
SCORES 2004-2012 Elementary - 4th Grade, Middle School - 8th Grade, High School
SCORES 2004-2012 Elementary - 4th Grade, Middle School - 8th Grade, High School
About half of high school students in
California are proficient at Math and Science
CALIFORNIA SCHOOLSScience and Math Scores at Elementary, Middle and High School Level
CALIFORNIA SCHOOLSScience and Math Scores at Elementary, Middle and High School Level
Scores have been getting better. Good!
CALIFORNIA SCHOOLSScience and Math Scores at Elementary, Middle and High School Level
Scores have been getting better. Good!
Maybe the Math tests
were harder for everyone
that year?
CALIFORNIA SCHOOLSScience and Math Scores at Elementary, Middle and High School Level
Scores have been getting better. Good!4th Grade
“cohort” in 2004 was 8th Grade in 2008
Maybe the Math tests
were harder for everyone
that year?
DATA SCIENCE WITH EXCELPivot tables let you rearrange data and trend lines measure the slope
LEARN TO BE A DATA SCIENTIST FOR $1
• Everything is being measured
• The latest data science tools are available to anyone for pennies
• There is lots of freely available data
• Pay attention in math and science class, play around with EMR and Bigquery and get an interesting and well paid job as a data scientist!