Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | sylvia-dennis |
View: | 214 times |
Download: | 2 times |
Lecy ∙ Data Driven Management
LECTURE 00Course
Overview
When President Dwight Eisenhower established NASA in 1958, he called on the country's top scientists to bring their talents to the government.
Half a century later, when President Barack Obama was elected into office, he issued a similar call to America's scientists, but this time, there is a different mission at stake. Today's government scientists are tasked with deploying the latest technology to bring the government into the digital era, allowing it to more effectively deliver services to the American people.
A team of engineers, coders, and developers have answered his call, leaving startups and top technology companies across the country for new posts in Washington, D.C.
When we asked members of the tech corp why they chose to make the switch from the private sector to the public sector, they explained that saw an opportunity to use their specialized skills to improve people's lives, from making Healthcare.gov as user-friendly as possible to ensuring that veterans receive support as soon as they need it.http://www.fastcompany.com/3046985/innovation-agents/meet-the-geeks-the-dc-tech-corps-leading-edge
Data-Driven
MANAGEMENTIn Public Organizations
What is data-driven management?
Can government play moneyball?
WHAT ISR ?
R Two guys in New Zealand who do not know how to program invent a language, give it away for free. It develops a cult following and takes on billion dollar industry giants like SAS and Stata.
R IS MANY THINGS
• R is a hybrid of a programming language and a stats package
• R is a platform– Operating system (environment) for programs (packages) written
by users– Data engine– Graphing engine
• R is an ecosystem– Packages can build on each other, code can be adapted
• R is a community
• R is a response to the commercialization of scientific knowledge at the expense of science
R IS GOOD AT SOME THINGS
• Rapid development and deployment of programs
• Customized professional graphics
• Open-source paradigm allows you to build on others work– For example, the “fix” command
• Breaking through cost barriers for small companies and students
• There is an amazing variety of packages and datasets (over 7000)– http://cran.r-project.org/web/views/
• Documentation is fairly good
R IS NOT GOOD AT OTHERS
• R is not built for large datasets (although there are now many ways to adapt it to these purposes)
• R is not as fast as compiled programming languages
• Distributed development means that uniform conventions are often not followed concerning function names, arguments, and documentation
• Output is not automatically pretty, so takes some extra time to format (though there are good packages for these purposes)
R EMBRACES OBJECT-ORIENTED PROGRAMMING
# example of plot O-O behavior
x <- 1:100y <- 2*x + rnorm(100,0,10)plot( x, y )
x2 <- cut( x, 5 )plot( x2, y )
m.01 <- lm( y ~ x )plot(m.01)
# example with variance O-O behavior:
dat <- data.frame( x, y )var( x )var( dat )
WHYR ?
Statistics
Network AnalysisMachine Learning
Text Analysis
GIS
Dynamic Reports
http://r4stats.com/articles/popularity/
R IS GROWING
API
Shiny
COURSE OVERVIEW
COURSE OBJECTIVES
• Expose you to new and interesting developments in the data programming
world.
• Ability to use R Studio, read R documentation, and write R scripts.
• Ability to write technical notes and report results using R Markdown docs.
• Familiarity with R conventions and the Object Oriented framework.
• Understanding of core data structures of R.
• Understanding of core data programming operations.
• Comfort with the R graphics engine.
• Work with raw data using text functions.
• Understanding of programming fundamentals.
• Create a data dashboard using R Shiny.
• Collaborate in teams using GitHub.
COURSE OBJECTIVES
• How much can I learn in a semester?
• What does this course prepare me for?
• What to do after taking this course?
https://www.coursera.org/course/rprog
COURSE SCHEDULE:
Weeks 1-5: Core Data Operations• 1 – Intro• 2 – Data Structures• 3 – Merge Data• 4 – Descriptive Statistics• 5 – Data Input
Weeks 6-9: Visualization• 6 – Principles of Visualization• 7 – Core Graphics• 8 – Advanced Graphics• 9 – Maps and GIS
Weeks 10-12: Programming and Text• 10 – Basic Programming• 11 – Text Analysis• 12 – Text Analysis• 13 – Thanksgiving Break
Weeks 14-15: Building a Dashboard in Shiny• 14 – Intro to Shiny & GitHub• 15 – More Shiny
REQUIRED TEXTS
• R Cookbook
• The Art of Programming in R
BLACKBOARD
• Please contact me at [email protected]
(not through Blackboard’s messaging)
• All assignments submitted via Blackboard
ASSIGNMENTS AND GRADES
COURSE ORGANIZATION
Labs (10 total):50%
Quizzes (3 total):15%Case Studies (13 total):15%Final Project:20%
LABS
• Meant to be practice• Graded pass / fail• Due each Tuesday before class• Office hours Mondays 2-3pm• Team work allowed / encouraged• Turn in your own code!• Only submit PDF or webpage complete files (no
HTML or RMD)
QUIZZES
• Opportunities to consolidate knowledge• In-class, written
CASE STUDY SUMMARIES:
• Each week there will be a case study of performance measurement, or performance management.
• Submit a 1-2 page summary of important lessons from the case study.
FINAL PROJECTS:
Create a Data Dashboard
• Teams of 3-5 students• Create a realistic scenario for an organization• Develop 1-3 key performance indicators• Implement a data collection / input process• Write a program to analyze and visualize the data• Create a Shiny app to share the reports
• All of your code will be managed in GitHub
FOR THURSDAY
• Install R and R Studio• Create an R Markdown document with the following information:
– Your name– Your department and degree– What you hope to take from the class– File New File R Markdown Document– http://www.rstudio.com/ide/docs/authoring/using_markdown
Knit to HTML save to PDF:
• First save the file as a .Rmd file.• Press the “knit to HTML” command.• You have now created an HTML file. Open in a browser and print
to PDF or save as a webpage complete file.• You will turn in the PDF or webpage complete files for homework
assignments. I do NOT want the .Rmd or raw .html files.
REQUIRED SOFTWARE
WE WILL BE USING
• The latest version of R (3.2.2 or higher)• R Studio development environment• GitHub (as much as we can)• R Shiny web toolkit
• Various packages throughout the semester– The Lahman Package for the first few weeks
• The textbooks are required and will be used extensively
– The R Cookbook– The Art of R Programming
github
“Software engineers will pay monthly fees for the rest of their lives in order to create free software out of other free software!”
Some examples:A short tutorial for using the ‘twitteR’ package:
https://sites.google.com/site/miningtwitter/questions/talking-about
https://github.com/gastonstat/Mining_Twitter
Hadley Wickam (he created R Studio):
https://github.com/hadley
VERSION CONTROL 101
This code was added
This code was deleted
SUPPORTS CONCURRENT DEVELOPMENT
GRAPHICS
Two population density measures compared. Migration patterns of birds.
OBJECTIVES
• Reflect on good visualization practices
• Understand ground, figure, and narrative on charts
• Learn the core functions of the graphics suite
• Learn how to customize graphs and create high quality images
• Touch on some nice mapping packages
WRITING CLEAR CODE
Donaudampfschiffahrtsgesellschaftskapitän
“Danube steamship company captain”
summary(lm(dat$crime[20:50]~bin(dat[20:50],”pop”],10)))
VS.
y.sub <- dat[ 20:50 , “crime” ]x.sub <- dat[ 20:50, “pop” ]x.bin <- bin( x.sub, 10 )lm.01 <- lm( y.sub ~ x.bin )summary( lm.01 )
THE R STYLE GUIDE
THE ‘LAHMAN’ PACKAGE
THE ART OF CREATING GRAPHICS:
http://chartsnthings.tumblr.com/post/22471358872/sketches-how-mariano-rivera-compares-to-baseballs
FROM THE NTY BLOG, CHARTSNTHINGS
http://chartsnthings.tumblr.com/post/47670081904/climate-change-crowbars-and-strikeouts
MISCELLANEOUS ANALYSIS
WHAT ISobject-oriented ?
R EMBRACES OBJECT-ORIENTED PROGRAMMING
# A function to make cookies:
make.cookies <- function( flour, eggs, sugar ) {
# these steps give the operations
batter <- mix( flours, eggs, sugar )
baked.goods <- bake( batter, temp=450 )
return( baked.goods )
}
# Each step of the recipe is a separate# function. Here "mix" and "bake" are # defined elsewhere as “mix.R” and “bake.R”.
# When you want to call the function you give # specific instances of the inputs
cookies.01 <- make.cookies( flour.01, eggs.01, sugar.01)
# Because R is object-oriented, you not only need# to call the function but you need to give a name# to the final product. A new data object is created# after each function is performed.
R EMBRACES OBJECT-ORIENTED PROGRAMMING
R EMBRACES OBJECT-ORIENTED PROGRAMMING
# example of plot O-O behavior
x <- 1:100y <- 2*x + rnorm(100,0,10)plot( x, y )
x2 <- cut( x, 5 )plot( x2, y )
m.01 <- lm( y ~ x )plot(m.01)
# example with variance O-O behavior:
dat <- data.frame( x, y )var( x )var( dat )