Date post: | 01-Nov-2014 |
Category: |
Technology |
Upload: | analyticsweek |
View: | 382 times |
Download: | 2 times |
The History and Use of R
Joseph Kambourakis
Ground Rules
• Interrupt me
• These are all my opinions and not of EMC or Big Data Analytics, Discovery & Visualization Meetup
• Slides will be available
Joseph Kambourakis @mouthorjoe
Taught Around the World
WPI
Bentley University
Big Data School
Source:Data Analytics Master's Degrees: 20 Top Programs
Sam Woolford & Dominique Haughton
First Got Exposed to R
What is
R is a free software environment for statistical computing and graphics
A language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files
What is R’s Hashtag?
Robert Gentleman & Ross Ihaka
• R: A Language for Data Analysis and Graphics
Starts with S
1976 1988 1991
Scheme
• Lexical scoping
Lexical scoping
• Searches through environments
– First global
• Global is your workspace
– Second namespace of packages
• More on packages later
Fortran
• source: Wikipedia
Under the Hood
Open Source
• GNU General Public License
• Freedom 0: The freedom to run the program for any purpose.
• Freedom 1: The freedom to study how the program works, and change it to make it do what you wish.
• Freedom 2: The freedom to redistribute copies so you can help your neighbor.
• Freedom 3: The freedom to improve the program, and release your improvements (and modified versions in general) to the public, so that the whole community benefits.
• source: GNU.org
R Project
• The R Foundation is a not for profit organization working in the public interest. It has been founded by the members of the R Development Core Team in order to – Provide support for the R project and other innovations in
statistical computing. We believe that R has become a mature and valuable tool and we would like to ensure its continued development and the development of future innovations in software for statistical and computational research.
– Provide a reference point for individuals, institutions or commercial enterprises that want to support or interact with the R development community.
– Hold and administer the copyright of R software and documentation.
• source: R Project
Contributors
How it Works: Design
How it Works: Design
• Functional
– mean()
– plot()
How it Works: Design • Interpreted language
How it Works: Install
• Hosted on Comprehensive R Archive Network (CRAN)
• 54 megabytes
http://cran.rstudio.com/
• Download and Install R
• Precompiled binary distributions of the base system and contributed packages, Windows and Mac users most likely want one of these versions of R:
• Download R for Linux
• Download R for (Mac) OS X
• Download R for Windows
• R is part of many Linux distributions, you should check with your Linux package management system in addition to the link above.
How it works: Command Line
How it Works: Packages
• Base
– mean()
• Utils
– read.csv()
• Stats
– lm()
– sd()
Packages
• Mostly hosted on CRAN
• Many others hosted elsewhere
– Github
– RStudio
– Bioconductor
– RevolutionR
Top 10 Most Popular Packages
• source: Revolution Analytics Blog
Data Frame
Capabilities
• ANALYTICS – Basic Mathematics – Basic Statistics – Probability Distributions – Machine Learning – Optimization and Mathematical Programming – Signal Processing – Simulation and Random Number Generation – Statistical Modeling – Statistical Tests
• GRAPHICS AND VISUALIZATION – Static Graphics – Dynamic Graphics – Devices and Formats
Model & Plot
GUI:RStudio
How Does it Compare?
How Does it Compare?
How Does it Compare? R SAS SPSS Professional MATLAB
Cost Free! Very VERY High High - $9,975 High
Documentation Yes Very comprehensive
OK Some examples
Training Course NA Yes Yes Yes
User interface Low Medium Best Medium
Output Separate commands
Automatically produce diagnosis graph and forecast
Totally automated Some automated via GUI, some specific command
Models* Does not STL moving average
Does not have ARCH/GARCH + and other moving average models
Does not have MA & decomposition models
Certification Program
Yes Yes Yes
Commercial Support
Commercial Support
• Version 3.1.1
7/10/2014
• source
Where it’s Now?
Where it’s Going
Source: Revolution Analytics Blog
Where it’s Going: Extensions and Interactions
• Rcpp
– Transfer from R to C++, and from C++ to R
• RLLVM
– Creates code
• H2O
– Big data package
The best thing about R is that it was developed by statisticians. The worst thing about R is that...it was developed by statisticians.
Bo Cowgill
Good: Open Source
• So many contributors
• Free!
• Community
Bad: Open Source
• No customer support
• Features
Good: Frequent Updates
• Always new packages
• New updates and bug fixes
Bad: Frequent Updates
• Package updates
• R updates
Bad: Documentation
Bad:Speed
• 40 year old code
Bad:Speed
• Interpreted
Bad:Speed
• Single threaded
Bad: Memory
• All stored in memory
Soccer Example
@11tegen11
Source: fun with R
Use Cases 4
How to Learn:
How to Learn:
How to Learn: RStudio How to Learn:
How to Learn: Data Camp
How to Learn:
How to Learn: Springer Series
How to Learn: Art of R
Programming
How to Learn: Boot Camp Boston Predictive Analytics Meetup
How to Learn: Online Videos
Web Resources:
Web Resources:
Web Resources:
UseR Groups & Conferences
Closing Thoughts
Thank You
Thank You
Questions
?