Date post: | 01-Jul-2015 |
Category: |
Data & Analytics |
Upload: | ajay-ohri |
View: | 3,137 times |
Download: | 0 times |
Basics of Analysis,
Analytics and R
Ajay Ohri
Why analysis
● Humans can count only till so much
● We understand summarized information
● We understand graphs faster
● We need to take decisions
● Wrong Decisions lead to huge costs
Central Tendency
● What is the difference between mean and
median
● When to use what?
● What is expected value?
● When can mean be misleading?
Exercise- What is the average height of this class
Grouped Means
Exercise-
What is height of class
What is the height of class by gender
What is the height of class by team
What is the height of class by dark-light colored clothing
CROSS TABS-
exercise of mtcars
Variance
What is the range (max - min)
What is a quartile (4 quarters)
What is a decile (10 deciles)
No one really uses standard deviation in
business world
Frequency Analysis
contingency tables
Height range Number of students Cumulative number
less than 5.0
feet
25 25
5.0–5.5 feet 35 60
5.5–6.0 feet 20 80
6.0–6.5 feet 20 100
Dance Sports TV Total
Men 2 10 8 20
Women 16 6 8 30
Total 18 16 16 50
Histogram
What is a distribution
EDA
Exploratory Data Analysis
Box Plot
Analytics
• What is analytics?
• Where is it used?
• How is it used?
• What are some good practices?
Analytics
• What is analytics? – Study of data for helping with decision making using software
• Where is it used?
• How is it used?
• What are some good practices?
Analytics
• What is analytics?
• Where is it used? – Industries (like Pharma, BFSI, Telecom, Retail)
• How is it used? –Use statistics and software
• What are some good practices?
Analytics
• What is analytics?
• Where is it used?
• How is it used?
• What are some good practices? –
– Learn one new thing extra from your competition every day. This is a fast moving field.
– Etc.
What is Data Science
Other Analytics Software
• SAS (Base) et al
• JMP
• SPSS
• Python
• Octave
• Clojure
• Julia(?)
Social Media Analytics
Some examples
http://decisionstats.com/2013/12/04/top-fourteen-interfaces-in-social-media-and-web-analytics-on-
the-internet/
Some use cases
http://decisionstats.com/2014/05/10/analyzing-facebook-networks-using-rstats/
http://decisionstats.com/2013/09/11/using-twitter-data-with-r/
What is R?http://www.r-project.org/
• Language– Object oriented
– Open Source
– Free
– Widely used
the concept of "objects" that have data fields(attributes that describe the object) and associated procedures known as methods. Objects, which are usually instances of classes, are used to interact with one another to design applications and computer programs
Pre Requisites
• Installation of Rhttp://cran.rstudio.com/bin/windows/base/
• R Studio
• R Packages
Pre Requisites• Installation of R
– RTools
• R Studiohttp://www.rstudio.com/products/rstudio/download/
• R Packages
install.packages(),update.packages(),library()Packages are installed once, updated periodically, but loaded every time
Interfaces to R
• ConsoleDefault
Customization
• IDE
• GUI
Demo-Basic Objects on R Console
• +
• -
• Log
• Exp
• *
• /
• ()
Hint- Up arrow gives you lasttyped command
Functions-ls() – what objects are hererm(“foo”) removes object named foo
AssignmentUsing = or -> assigns object names to values
Functions and Loops
• Loops
for (number in 1:5){ print (number) }
Functions and Loops
• Function
functionajay=function(a)(a^2+2*a+1)
Hint: Always match brackets
Each ( deserves a )
Each { deserves a }Each [ deserves a ]
Demo-Basic Objects on R Console
• +
• -
• Log
• Exp
• *
This is made more clear in
next slide
Hint- Up arrow gives you lasttyped command
Functions-class() gives classdim() gives dimensionsnrow() gives rowsncol() gives columnslength() gives length
str() gives structure
Demo-Datasets on R Console
•
Hint- use data() to list all loaded datasets
Demo-Datasets on R Console
•
Hint- use data() to list all loaded datasetslibrary(FOO) loads package “FOO”
Packages in R
• CRAN
• CRAN Views
• R Documentation
Documentation in R
• Help ? And ??
• CRAN Views
• Package Help
• Tips for Googling
– Stack Overflow
– Email Lists
– R Bloggers
Graphical Interfaces to R
• R Commander
• Rattle
• Deducer
Overview of R Commander
DemoR Commander – 3D Graphs
Overview of Rattle
Demo Rattle
Overview of Deducer (with JGR)
Demo Deducer
• data()
• data(mtcars)
read.table()
From Databases
The RODBC package provides access to databases through an ODBC interface.
The primary functions are
• odbcConnect(dsn, uid="", pwd="") Open a connection to an ODBC database
• sqlFetch(channel, sqltable) Read a table from an ODBC database into a data frame
Hint- a good site to learn R http://www.statmethods.net
A Detour to SQL
From Web (aka Web Scraping)
• readlinesHint : R is case sensitivereadlines is not the same as readLines
Hint : Use head() and tail() to inspect objects
Other packages are XML and CurlCase Study- http://decisionstats.com/2013/04/14/using-r-for-cricket-analysis-rstats/
Inspecting Data Quality: Demo
•
Inspecting Data Quality: Demo
•
Data Selection: Demo
Questions- How do I use multiple conditions (AND OR)Can I do away with subset functionHow do I select random sample
Useful Link- http://decisionstats.com/2013/11/24/50-functions-to-clear-a-basic-interview-for-business-analytics-rstats/
Data Exploration
• missing values are represented by NA in R
• Demo
– is.na
– na.omit
– na.rm
Data Visualization
Notes-Explaining Basic Types of GraphsCustomizing GraphsGraph OutputAdvanced GraphsFacets,Grammar of GraphicsData Visualization Rules
Data Manipulation Demo
Notes-1. gsub2. gsub with
escape 3. as operator4. is operator