R: A Statistics Program For Teaching & Research Josué Guzmán 11 Nov. 2007

R: A Statistics ProgramFor Teaching &

ResearchJosué Guzmán11 Nov. 2007


Some Useful R Links

• R Home Page www.r-project.org

• CRAN http://cran.r-project.org

• Precompiled Binary Distributions

• Windows (95 and later)

• R Manuals

R Installation

• R: Statistical Analysis & Graphics

• Freely Available Under GPL

• Binary Distributions

• Installation – Standard Steps

Running R

Statistical Programming with R

• Learn Language Basics

• Learn Documentation / Help System

• Learn Data Manipulation & Graphics

• Perform Basic Statistical Analysis

First Steps: Interacting with R

• Type a Command & Press Enter

• R Executes (printing the result if relevant)

• R waits for more input

Some Examples

2 * 2

[1] 4


[1] 0.1353353

rdmnorm =rnormal(1000)

R Functions

• exp, log and rnorm are functions

• Function calls are indicated by the presence of parentheses

Example: hist(rdmnorm, col = "magenta")

Variables and Assignments

The = operator; the <- operator also works

x = 2.2y = x + 3.5sqrt(x)y

x ^ y

Variables and Assignments

• Variable names cannot start with a digit

• Names are Case-Sensitive

• Some common names are already used by R

• Examples: c, q, t, C, D, F, I, T

• Should be avoided

Vectorized Arithmetic

• Elementary data types in R are all vectors

• The c(...) construct used to create vectors:

• Bolstad, 2004, exercise 13.2, page 253

fertilizer = c(1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5)


Vectorized Arithmetic [cont.]

•Arithmetic operations (+, -, *, /, ^) and mathematical functions (sin, cos, log, …) work element-wise on vectors

yield = c(25, 31, 27, 28, 36, 35, 32, 34)


Vectorized Arithmetic [cont.]

sum.yield = sum(yield)sum.yield

n = length(yield)n

avg.yield = sum.yield/navg.yield

• plot(x, y) function – simple way to produce R graphics:

plot(fertilizer, log(yield), main = "Fertilizer vs. Yield")

Getting Help• help.start( ) Starts a browser window with an HTML

help interface. Links to manual An Introduction to R, as well as topic-wise listings.

• help(topic) Help page for a particular topic or

function. Every R function has a help page.

• help.search("search string") Subject/keyword search

Getting Help [cont.]

• Short-cut: question mark (?) help(plot) ? plot

• To know about a specific subject, use help.search function. Example:


apropos( )

• apropos function - list of topics that partially match its argument:

apropos("plot")[1:10][1] ".__C__recordedplot" "biplot"

[3] "interaction.plot" "lag.plot"

[5] "monthplot" "plot.TukeyHSD"

[7] "plot.density" "plot.ecdf"

[9] "plot.lm" "plot.mlm"

R Packages

• R makes use of a system of packages• Each package is a collection of routines

with a common theme• The core of R itself is a package called

base• A collection of packages is called a library• Some packages are already loaded when

R starts up• Other packages need be loaded using the

library function

R Packages [cont.]

Several packages come pre-installed with R:

installed.packages( )[, 1][1] "ISwR" "KernSmooth" "MASS" "base"[5] "boot" "class" "cluster" "foreign"[9] "graphics" "grid" "lattice" "methods"[13] "mgcv" "nlme" "nnet" "rpart"[17] "spatial" "splines" "stats" "stats4"[21] "survival" "tcltk" "tools" "utils"

Contributed Packages

• Many packages are available from CRAN

• Some packages are already loaded when R starts up. List of currently loaded packages - use search:

search( )[1] ".GlobalEnv" "package:tools" "package:methods"

[4] "package:stats" "package:graphics" "package:utils"

[7] "Autoloads" "package:base"

R Packages

• Can be loaded by the user. Example: UsingR package


• New packages downloaded using the install.packages function:

install.packages("UsingR") library(help = UsingR)

Data Types

• vector – Set of elements in a specified order

• matrix – Two-dimensional array of elements of the same mode

• factor – Vector of categorical data• data frame – Two-dimensional array

whose columns may represent data of different modes

• list – Set of components that can be any other object type

Editing Data Sets• Can create and modify data sets on the command

line xx = seq(from = 1, to = 5) xx

x2 = 1 : 5 x2

yy = scan( )5 8 10 4 2 6 2011 21 32 43 55 yy

• Can edit a data set once it is created edit(mydata) data.entry(mydata)

© J. Guzmán, 2007 R: Stat. Prog. for Teach. & Res. 26

Built-in Data

Data from a library:library(UsingR) attach(cfb)#Consumer-Finances Surveycfb$INCOMEcfb$EDUCeduc.fac = factor(EDUC)plot(INCOME ~ educ.fac, xlab = "EDUCATION", ylab = "INCOME")


Data Modes

• logical – Binary mode, values represented as TRUE or FALSE

• numeric – Numeric mode [integer, single, & double precision]

• complex – Complex numeric values

• character – Character values represented as strings

Data Frames

• read.table( ) – Reads in data from an external file

read.table("data.txt" , header = T)

read.table(file = file.choose( ), header = T)

• data.frame – Binds R objects of various kinds

read.table Function

• Reads ASCII file, creates a data frame• Data in tables of rows and columns• If first line contains column labels:

Use argument header = T

• Field separator is white space• Also read.csv and read.csv2

– Assume , and ; separations, respectively

• Treats characters as factors

save( ) and load( )• Used for R Functions and Objects

• Understandable to load only

x = 23

y = 44

save(x, y, file = "xy.Rdata")


Comparison Operators

!= Not Equal To

< Less Than

<= Less Than or Equal To

== Exactly Equal To

> Greater Than

>= Greater Than or Equal To

Some Logical Operators

! Not

| Or (For Calculating Vectors and Arrays of Logicals)

& And (For Calculating Vectors and Arrays of Logicals)

Some Mathematical Functions

abs Absolute Valueceiling Next Larger Integerfloor Next Smallest Integercos, sin, tan Trigonometric

Functionsexp(x) e^x [e = 2.71828 …]log Natural Logarithmlog10 Logarithm Base 10sqrt Square Root

Statistical Summary Functions

length Length of Object
max Maximum Value
mean Arithmetic Mean
median Median
min Minimum Value
prod Product of Values
quantile Empirical Quantiles
sum Sum
var Variance - Covariance
sd Standard Deviation
cor Correlation Between Vectors


Sorting and Other Functions

rev Put Values of Vectors in Reverse Order

sort Sort Values of Vectororder Permutation of Elements to Produce

Sorted Orderrank Ranks of Values in Vectormatch Detect Occurrences in a Vectorcumsum Cumulative Sums of Values in

Vectorcumprod Cumulative Products

Plotting Functions Useful for

One-Dimensional Databarplot Bar plot

boxplot Box & Whisker plot

hist Histogram

dotchart Dot plot

pie Pie chart

Plotting Functions Useful for

Two-Dimensional Dataplot Creates a scatter plot:

plot(x, y)

qqnorm Quantile-quantile plot sample vs. N(0, 1): qqnorm(x)

qqplot Plot quantile-quantile plot for two samples: qqplot(x , y)

pairs Creates a pairs or scatter plot matrix: attach(babies) pairs(babies[ , c("gestation", "wt", "age", "inc" ) ] )

Three-Dimensional PlottingFunctions

contour Contour plot

persp Perspective plot

image Image plot

Probability Distributions Using R

• Pseudo-random sampling

sample(0:20, 5) # select 5 WOR

sample(0:20, 5, replace = T) # select WR

• Coin toss simulation [0 = tail; 1 = head] 20 tosses:

sample(c(0, 1), 20, replace=T)

For Any Probability Distribution

ddist density or probability

pdist cumulative probability

qdist quantiles [percentiles]

rdist pseudo-random selection

Binomial Distribution

X ~ Binomial(n , p) ; x = 0, 1, …, n

dbinom(x , n , p ) Density or point probability

pbinom(x , n , p ) Cumulative distribution

qbinom(q , n , p ) Quantiles [ 0 < q < 1 ]

rbinom(m , n , p ) Pseudo-random numbers

Binomial Distribution

Coin toss simulation: x = 0:20 # num. of heads in 20 tosses

px = dbinom(x , size = 20, prob = 0.5)

plot(x , px, type = "h") # graph display

curve(dnorm(x, 10, sqrt(20*.5*.5)), col=2, add=T)

Normal Distribution

X ~ Normal(µ,)dnorm(x , µ,) Density

pnorm(x , µ,) Cumulative probability

qnorm(q , µ,) Quantiles

rnorm(m , µ,) Random numbers

Standard Normal

x = seq(-3.5,3.5,0.1) # x ~ N(0,1)

prx = dnorm(x) # M = 0 , SD = 1

plot(x , prx , type = "l" )

Or using: curve(dnorm(x), from = -3.5 , to = 3.5)

Cumulative Normal & Quantiles

curve(pnorm(x), from=-3.5,to=3.5)

qnorm(.25) #Percentile 25, x~N(0,1)

qnorm(.75, m=50, sd=2) # M=50,SD=2

qnorm(c(.1,.3,.7,.9), m=65, sd=3)

Poisson Distribution

X ~ Poisson( λ ) ; X = 0, 1, 2, 3, …

x = 0:20 # Suppose λ = 3.5

prx = dpois(x, lambda = 3.5)

plot(x , prx, type = "h", main = "Poisson Distribution")

text(10, .10, "Lambda = 3.5")

Sampling Distributions

n = 25; curve(dnorm(x , 0, 1/sqrt(n)), -3, 3,

xlab = "Mean", ylab = "Densities of Sample Mean", bty = "l" )

n=5 ; curve(dnorm(x, 0, 1/sqrt(n)), add=T)

n=1 ; curve(dnorm(x, 0, 1/sqrt(n)), add=T)

t – Distribution as df Increase curve(dnorm(x), -4, 4, main="Normal & t

Distributions", ylab="Densities" )

k=3; curve(dt(x , df = k ), lty = k, add = T)

k=5; curve(dt(x , df = k ), lty = k, add = T)

k=15; curve(dt(x , df = k ), lty = k, add = T)

k=100; curve(dt(x , df = k ), lty = k, add = T)

Binomial-Normal Approximation

• Coin toss example: n = 100, p = .5• P(X ≤ 40)?

Using Larget’s prob.R file: source(file.choose( ) )

gbinom(100, .5, b = 40 )

Normal approximation: µ = 50, = 5 gnorm(50, 5, b = 40.5)

Binomial Distribution n = 100 , p = 0.5

Normal Distribution with 50, 5

30 40 50 60 70

P( X < 40.5 ) = 0.0287

P( X > 40.5 ) = 0.9713

One-Sample t-test

Ho: µ = µ0 Null Hypothesis

Ha: µ µ0 Two-sided

Ha: µ > µ0 One-sided

Ha: µ < µ0 One-sided

R One-Sample t.test

x = c(x1, x2, …, xn) # data set

t.test(x, mu = Mo) # two-sided

t.test(x, mu = Mo, alt = "g") # one-sided

t.test(x, mu = Mo, alt = "l") # one-sided

R One-Sample t.test [cont.]

Example: Text, Problem 8.11, page 226 library(UsingR) attach(stud.recs) x = sat.m # Math SAT Scores hist(x) # Visual display qqnorm(x) # Normal quantile plot qqline(x, col=2) # Add equality line

t.test(x, mu = 500) detach(stud.recs)

© J. Guzmán, 2007 R: Stat. Prog. for Teach. & Res. 59

Normality Test

Shapiro-Wilk test:Ho: X ~ Normal Ha: X !~ Normal

Command: shapiro.test(x)

# Examine p-value

Normality Test [cont.]

Normality Test [cont.]


qqline(OBP, col=2)


wilcox.test(OBP, mu=.330)

One-Sample Proportion Test

x total successes; n sample size

prop.test(x, n, p = Po) # two-sided

prop.test(x, n, p = Po, alt= "g")

prop.test(x, n, p = Po, alt= "l")

Or Using Binomial “Exact” Test

binom.test(x, n, p = Po) binom.test(x, n, p = Po, alt = "g")

binom.test(x, n, p = Po, alt = "l")

Proportion Test

Text, Example 8.3: Survey US Poverty Rate

Ho: P = 0.113 # Year 2000 RateHa: P > 0.113 # Year 2001 Rate Increased

x = 5850 # Sample people UPL n = 50000 # Sample size prop.test(x, n, p = 0.113, alt = "g") binom.test(x, n, p = 0.113, alt = "g")

Some Modeling Functions/Packages

Linear Models: anova, car, lm, glmGraphics: graphics, grid,

latticeMultivariate: mva, clusterSurvey: surveySQC: qccTime Series: tseriesBayesian: BRugs, MCMCpack,

… Simulation: boot, bootstrap, Zelig

You Perform An Experiment

In Order To Learn,Not To Prove.

W Edwards Deming