Date post: | 27-Jan-2017 |
Category: |
Education |
Upload: | shawn-handran |
View: | 338 times |
Download: | 0 times |
BEGINNER’S GUIDE TO GETTING PUBLIC DATA INTO THE CLASSROOMPRESENTED OCTOBER 17, 2015 SOCIETY FOR SCIENCE AND THE PUBLIC TEACHER CONFERENCEWASHINGTON DC
Shawn Handran, Ph.D.
About me 10 years in academic research
Montana State BS, Washington Univ. in St. Louis PhD, Harvard Medical School post-doc
7 years in biotechnology Genomics, bioinformatics, HT screening & imaging
4 years in Non-Profit sector Foundation/fundraising database research
4th year of teaching at FCS AP Biology, AP Statistics, Biotechnology
Getting public data into the classroom
Stimulate intrinsic interest Keep barriers to entry low Get to a good comfort level Gradual release
% of content
Getting Public Data into the Classroom
STEP 1:STIMULATE INTRINSIC INTEREST
Stimulate intrinsic interest Teachers often mandate the
parameters Too much control stifles creativity
Give students ownership Ownership drives interest level and
engagement Provide guidance
Yes, some parameters are still required!
Some recurring themes Music Sports Whatever example you just showed in
class
Getting Public Data into the Classroom
STEP 2:KEEP BARRIERS TO ENTRY LOW
Keep barriers to entry low Datasets
Ease of access Dataset format
Data analysis Cost Ease of use
Dataset barriers to entry Ease of access
HTML tables Downloadable files Copy and paste Query database/forms PDF
Diffi
culty
Dataset barriers to entry Dataset format
HTML use Import HTML Table function Text format (csv, tab) or Excel (xls, xlsx) Query forms Simple database files (e.g., Access) Complex database files
Diffi
culty
Keep Barriers to Entry Low
TUTORIAL 1: IMPORT HTML TABLE INTO GOOGLE SHEETS
MLB 2014 AL team summary stats Baseball-
Reference.com http://goo.gl/5RU0Gt
Import HTML Table template Google Sheets https://
goo.gl/PTv7vl
Import HTML Table
Import HTML Table
Two functions of data analysis Data handling Data visualization
Most programs do both but some not well
You’ll often use multiple programs
Data analysis: cost vs. ease of use
R Stata SAS
OpenOffice StatCrunchGoogleSheets NumbersGapminder
JMPExcelPublisher
MinitabFathom*
Free $$$
Har
dEa
sy
Tableau PublicSPSS
Illustrator
Tableau
*discontinued
Spreadsheet/graphing programs Advantages
Free or close to free (except Excel) Good selection of canned graphs
Disadvantages Challenging for students to learn Requires a lot of wizard-level hacking/tweaking
Winners: Google Sheets, MS Excel
Statistical programs Advantages
Handles large datasets faster and better than Excel Designed for statistical analysis Handles variables seamlessly More graph options and better graph editing tools than Excel About the same learning curve as Excel for simple functions
Disadvantages Moderate to high cost, even with academic pricing More sophisticated graphs or analyses require mad skills Poor graphic export options
Winners: JMP, Minitab
Minitab
StatCrunch
Dataset size: 286K
Graphic design programs Advantages
Perfect control over every graphic element Final output looks stunning and is scalable
Disadvantages Zero data handling and analysis capability Huge learning curve Expensive
Winner: Adobe Illustrator Runner up: Microsoft Publisher (poor man’s Illustrator)
Tableau Public Advantages
Free including 10GB online storage Handles humongous datasets Interactive with mouse-over information Easy to use for simple datasets and graphs
Disadvantages Everything you create is public Data handling is limited and removing variables can
be tedious (but not always)
Keep Barriers to Entry Low
TUTORIAL 2:TABLEAU PUBLIC
Tableau Public Sign up and download desktop/mobile
apphttps://public.tableau.com/s/
Upload a data file Start tinkering!
Dataset size: 5.5K
Getting Public Data into the Classroom
STEP 3:GET TO A GOOD COMFORT LEVEL
Get to a good comfort level Getting started: Survey of public datasets Getting help: Learn from data experts Getting acquainted: Make new friends
Disclaimer: these lists are by no means exhaustive!
Getting started: public datasets Data.gov (186,000+ data sets)
http://www.data.gov/
Big Machine Learning (BigML) blog posthttp://blog.bigml.com/list-of-public-data-sources-fit-for-machine-learning/
Getting started: public datasets Gapminder Offline software (free)
http://www.gapminder.org/downloads/ Pre-loaded data! Cake walk easy to use! Dynamic and awesome looking!
Getting started: public datasets HTML tables
http://www.baseball-reference.com/ http://www.billboard.com/archive/charts http://apps.who.int/gho/data/?
theme=home
Getting started: public datasets Download files (text, Excel)
http://www.seanlahman.com/baseball-archive/statistics
http://www.gapminder.org/data/ https://data.cdc.gov/browse
Getting started: public datasets Copy and Paste
https://gssdataexplorer.norc.org/ (easy)http://espn.go.com/mlb/statistics (tedious)
Getting help: Learn from data experts David McCandliss
http://www.informationisbeautiful.net/ Andy Kirk http
://www.visualisingdata.com/blog/ Hans Rosling http
://www.gapminder.org/videos/ Edward Tufte http
://www.edwardtufte.com/tufte/
Getting acquainted: make friends Here at this conference On social media networks
You’ll have better luck on LinkedIn and G+
Don’t be afraid to reach out
Getting Public Data into the Classroom
STEP 4:GRADUAL RELEASE
Gradual release Model
Don’t just show it—demo it live Encourage
Preferably in-class computer time/activities Release and nudge
More nudginghigher quality of final product
Student Project: Billboard Top100 Student level: 12 (AP Statistics)
International student Public sources:
Billboard Top 100, Radio, Digital 2014http://www.billboard.com/archive/charts/2014
Moderate amount of nudging Mostly for language and cultural help
Student Project: MLB Hitting Stats Student level: 12 (AP Statistics)
Local student Public source:
http://espn.go.com/mlb/statistics Low amount of nudging
Student had an excellent grasp on statistical analysis
Less nudging, less complexity
Batting Average On Base Percentage0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
0.400
Home Away
Away
Home
0.480.450.420.390.360.330.300.270.240.21Data
Home vs. Away On Base Percentage
MLB 2013 Home vs. Away Statistics
Dataset size: 808
Tableau possibilities… I recreated the same
dataset in ~2 hours of work on Tableau Public and visualized 11K data points
https://goo.gl/ffreyb
Dataset size: 11K
Student Project: Nearest Stars Student level: 8 (Energy Science) Public sources:
Nearby Stars Observatory http://nbso.org List of Nearest Stars Wikipedia Stellar Database http://stellar-database.com Hubble Space Telescope http://hubblesite.org
Extensive nudging full disclosure: my daughter’s project
Dataset size: 528
Sirius star system image fromHubble Space Telescope
Re-envisioned in Tableau Same dataset
recreated on Tableau in ~1 hour
https://goo.gl/tR0Wvl
Getting Public Data into the Classroom
WRAPPING UP
Getting public data into the classroom Get the students interested in something
important to them (not to you) Keep the barriers to entry low
GapMinder, Google Sheets, Excel, Tableau Get yourself trained and prepared
You don’t need to be an expert! Model it for them, then let them do it
Contact and links LinkedIn:
www.linkedin.com/in/shawnhandran