+ All Categories
Home > Documents > Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build...

Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build...

Date post: 01-Apr-2015
Category:
Upload: asher-bank
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
25
Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics
Transcript
Page 1: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Why R?

• Free

• Powerful (add-on packages)

• Online help from statistical community

• Code-based (can build programs)

• Publication-quality graphics

Page 2: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Why not?

• Time to learn code

• Very simple statistics may be faster with

“point-and-click” software

(e.g. Statistica, JMP)

Page 3: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Why generalized linear models (GLMs)?

Most ecological data FAIL these two

assumptions of parametric statistics:

• Variance is independent of mean

(“homoscedasticity”)

• Data are normally distributed

Page 4: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Taylors power law: most ecological data has 1>b>2

Mean

Variance Variance = a* Mean b

Page 5: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Many types of ecological data are expected to be non-normal

• Count data are expected to be Poisson

Examples: population size, species richness

• Binary (0,1) data are expected to be

binomial

Examples: survivorship, species presence

Page 6: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Workshop in R & GLMs

Session 1: Basic commands + linear models

Session 2: Testing parametric assumptions

Session 3: How generalized linear models

work

Session 4: Model simplification and

overdispersion

Page 7: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Exercise

1. Open R

“>” is the command prompt

2. Write:

x <- “hello”

x

3. What do the arrow keys do? And the “end”

key?

Ready!

Page 8: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Exercise

x <- 5

y<- 1

x+y; x*y; x/y ; x^y

sqrt(x); log (x); exp (x)

Careful! • Capitalization matters, Y and y are different.• Spaces do not matter, x<-5 is the same as x < - 5.

“;” means new command follows

Page 9: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Vectors

8

2

5

9

X <- c(8,2,5,9)

“c” means combine

Page 10: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Vectors

x <- rep (0,4)

x <- 1:4

x <- seq (1,7, by=2)

0,0,0,0

1,2,3,4

1,3,5,7

Create a vector called “test”

0,0,0,0,2,4,6,8,10

using all of the commands c, rep, seq

test<- c (rep(0,4), seq(2,10,by=2))

Page 11: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Vectors

Select an element of your vector (x = 1,3,5,7):

x[2] 3

1,5

3,5,7

x[c(1,3)]

x[2:4]

Change an element of your vector (x = 1,3,5,7):

x[1] <- 9 ; x 9,3,5,7

Page 12: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Matrices

Dog <- c(1,4,6,8) Cat<- c(2,3,5,7)Animals<-cbind (Dog, Cat)

Dog Cat 1 24 36 58 7

vectorvector

matrix

Page 13: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Logical operators

x<- 5; y<- 6x > y x< yx==yx!=y

True is the same as 1, false is the same as 0

falsetruefalsetrue

2 + (x>=y)2 + (x<=y)

23

Page 14: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Logical operators

x<- c(1,2,3,4); y<- c(5,6,7,8)

z <- x [y >= 7]; z

Useful for quickly making subsets of your data!

3,4

x<- c(1,0.01,3,0.02)

In this vector, change all values <1 to 0

x[x<1]<-0

Page 15: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Conditional operators

x<- 5 ; z<-0

if (x>4) {z<-2}; z

Could have a large program running in { }

2

Page 16: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Loopsy<-0; x<-0

for (y in 1:20) {x<- x+ 0.5; print(x)}

Useful for programming randomization procedures. Bootstrap example:

y<-0; x<-1:50output<-rep(0,1000)

for (y in 1:1000) {output [y] <- var (sample (x, replace=T))}

mean(output) 207.3996

Page 17: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Writing programs

I encourage you to use the script editor!

File > New script

Write your codeSelect the code you want to run (CTRL-A is all code)Run code (CTRL-R)

File > Save asR script files are always *.R

Page 18: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Entering data1. In Excel, give your data columns/rows and text data

simple one word labels (e.g."treatment")

2. Format cells so < 8 digits per cell.

3. Save as "csv" file.

4. Use the following command to find and load your

file:

diane<-read.table(file.choose(),sep=“,”,header=TRUE)

5. Check it is there! diane

Invent a dataframe name

Page 19: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Dataframes

• Dataframes are analogous to spreadsheets

• Best if all columns in your dataframe have the same

length

• Missing values are coded as "NA" in R

• If you coded your missing values with a different

label in your spreadsheet (e.g. "none") then:

read.table (….., na.strings="none")

Page 20: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Dataframes

Two ways to identify a column (called "treatment") in

your dataframe (called "diane"):

diane$treatment

OR

attach(diane); treatment

At end of session, remember to: detach(diane)

Page 21: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Summary statistics

length (x)

mean (x)

var (x)

cor (x,y)

sum (x)

summary (x) minimum, maximum, mean, median, quartiles

What is the correlation between two variables in your dataset?

Page 22: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

Factors

• A factor has several discrete levels (e.g. control,

herbicide)

• If a vector contains text, R automatically assumes it

is a factor.

• To manually convert numeric vector to a factor:

x <- as.factor(x)

• To check if your vector is a factor, and what the

levels are:

is.factor(x) ; levels(x)

Page 23: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

1. Download R on your computer.

Either go to http://www.r-project.org/ and follow the download CRAN links

or directly to http://mirror.cricyt.edu.ar/r/

2. Instruction Manuals to R are found at main webpage:

http://www.r-project.org/

follow links to Documentation > Manuals

I recommend "An Introduction to R"

Homework

Page 24: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

3. Write a short program that:

• Allows you to import the data from Lakedata_06.csv

(posted on www.zoology.ubc.ca/~srivast/zool502)

• Make lake area into a factor called AreaFactor:

Area 0 to 5 ha: small

Area 5.1 to 10: medium

Area > 10 ha: large

Page 25: Why R? Free Powerful (add-on packages) Online help from statistical community Code-based (can build programs) Publication-quality graphics.

hints

You will need to:

1. Tell R how long AreaFactor will be.

2. Assign cells in AreaFactor to each of the 3 levels

3. Make AreaFactor into a factor, then check that it is a factor


Recommended