+ All Categories
Home > Documents > Jack Chen

Jack Chen

Date post: 14-Jan-2016
Category:
Upload: jimbo
View: 23 times
Download: 0 times
Share this document with a friend
Description:
Crash Course in R · October 16, 2009. Jack Chen. Presentation Flow. R Session: Function writing Plots customization Simulation tips. Background/ Environment. Read/Write Data. Common Data Structures and Operations. Object- o rientated Concept. Graphics Samples. Control Blocks. - PowerPoint PPT Presentation
Popular Tags:
64
Jack Chen Crash Course in R · October 16, 2009
Transcript
Page 1: Jack Chen

Jack ChenCrash Course in R · October 16, 2009

Page 2: Jack Chen

2

Presentation Flow

Major Topics

Background/ Environment

Object- orientated Concept

Common Data Structures and Operations

Read/Write Data

Graphics Samples

R Session:

Function writing Plots customization Simulation tips

Control Blocks

Page 3: Jack Chen

Background/Environment

Page 4: Jack Chen

4

Mid 1970s, Bell Laboratory

John Chambers, Rick Becker

History Background

Statistical

Computing

Subroutines

FortranInteractive

Environment

“Interactive Statistical Computing System”

“Statistical Analysis System”

Engine

“S”

Page 5: Jack Chen

5

Early 1990s, University of Auckland

Ross Ihaka, Robert Gentleman

”R”

History Background

Interactive

Environment

Statistical

Computing

Subroutines

Engine

C

Pas

cal

Java

C++Fortran

Perl …R functions

Page 6: Jack Chen

6

Major differences between S and Ro Syntaxo Memory managemento Variable scopingo S has developed into S-plus, a commercially available

software from Tibcoo R is an open source freeware, with contributed packages

from researchers worldwideo Recently, XLSolutions is developing R-plus, the

commercial version of R

History Background

Page 7: Jack Chen

7

Starting R in Windows

Environment

Command line to interact with R

Mouse-click menus

Mouse-click shortcuts

Windows

Page 8: Jack Chen

8

Some keyboard shortcuts for the Windows platform:o Esc: cancels current line of execution (useful when running

into trouble)o Ctr-p or arrow up: previous commando Ctr-n or arrow down: next commando Ctr-u: erase lineo Ctr-a or ‘home’: beginning of lineo Ctr-e or ‘end’: end of lineo Ctr-c: copy highlighted texto Ctr-v: pasteo Ctr-x: copy and paste highlighted texto Ctr-l: clear command line windowo Ctr-z or q(): quit

EnvironmentWindows

Page 9: Jack Chen

9

Starting R in Unix

Environment

Command line to interact with R

Unix

Page 10: Jack Chen

10

Some keyboard shortcuts for Unix platform:o Esc or Ctr-c: cancels current line of execution (useful when

running into trouble)o Ctr-p or arrow up: previous commando Ctr-n or arrow down: next commando Ctr-u: erase lineo Ctr-a: beginning of lineo Ctr-e: end of lineo Ctr-z: send to background (type fg to bring back R)o Ctr-l: clear command line windowo Ctr-r : reverse search command historyo q(): quit session

EnvironmentUnix

Page 11: Jack Chen

11

R has an interpretative environment

Everything you type on the command line followed by ‘enter’ will be sent to R’s internal engine. R performs the following steps:o Interprets what you have typedo Evaluates ito Returns a result (possibly an error message)

The only exception when R sees a comment. R does not interpret anything after the pound sign #

EnvironmentR interpretor

Page 12: Jack Chen

Object-oriented Concept

Page 13: Jack Chen

13

Object-oriented programming is a natural way to classify and modularize “things” of interest in order to interact with them during program execution.

For example, suppose in our program there are 3 shapes:o Circleo Squareo Triangle

Initializationo We want to be able to create different shapes of different

sizes Interaction

o We want each shape to be able to report to us its areao We want each shape to be able to display itself

Object-oriented ConceptIntuition

Page 14: Jack Chen

14Object-oriented ConceptIntuition

Class: Shape Type: Circle Functions:

Report area Draw

Attributes: Name ID Radius r

Class: Shape Type: Square Functions:

Report area Draw

Attributes: Name ID Width w

Class: Shape Type: Triangle

(Isosceles) Functions:

Report area Draw

Attributes: Name ID Base b Height h

Internally in a program:

Page 15: Jack Chen

15Object-oriented ConceptIntuition

Class: Shape Type: Circle Functions:

Report area Draw

Attributes: Name ID Radius r

Typical programming steps:

Radius: r = 1

Name: ID = circle1

Initialize

Interact

Tell me the area

12π= 3.14159…

Interact

Draw

Page 16: Jack Chen

16Object-oriented ConceptIntuition

Class: Shape Type: Circle Functions:

Report area Draw

Attributes: Name ID Radius r

Typical programming steps:

Radius: r = 2

Name: ID = circle2

Initialize

Interact

Tell me the area

22π= 12.566…

Interact

Draw

Page 17: Jack Chen

17Object-oriented ConceptIntuition

Typical programming steps:

Width: w = 1

Name: ID = square1

Initialize

Interact

Tell me the area

12= 1

Interact

Draw

Class: Shape Type: Square Functions:

Report area Draw

Attributes: Name ID Width w

Page 18: Jack Chen

18Object-oriented ConceptIntuition

Typical programming steps:

base: b = 1

Height: h = 0.866

Name: ID = tri1

Initialize

Interact

Tell me the area1(0.866)/2

= 0.433

Interact

Draw

Class: Shape Type: Triangle

(Isosceles) Functions:

Report area Draw

Attributes: Name ID Base b Height h

Page 19: Jack Chen

19Object-oriented ConceptIntuition

Class: Shape Type: Circle Functions:

Report area Draw

Attributes: Name ID Radius r

Translating to sensible commands:

Radius: r = 1

Name: ID = circle1

Initialize

Interact

Tell me the area

12π= 3.14159…

Interact

Drawcircle1 = Circle(r=1)

area(circle1)

draw(circle1)

Page 20: Jack Chen

20

Programming commandso circle2 = Circle(radius=2)o area(circle2)o draw(circle2)

o square1 = Square(w=1)o area(square1)o draw(square1)

o tri1 = Triangle(b=1, h=0.866)o area(tri1)o draw(tri1)

Object-oriented ConceptIntuition

Page 21: Jack Chen

21

What does this have to do with R?

o R is inherently object-oriented.o R has a set of pre-defined objects that we can interact

with themo There are tons of objects inside various packages in R

online repository for us to perform various taskso We can also write our own R objects that perform

analysis to our needso The way we interact with R is very similar to the way we

interacted with the program with 3 shapes

Object-oriented ConceptIn relation to R

Page 22: Jack Chen

Common Data Structures and Operations in R

Page 23: Jack Chen

23

Primitive data objectso Comes with all R installations

o Integers: -3, -2, 1, 2, 3, 1e+10, …o Doubles: 0.789, 3.14, 1.68, 2.9e-6, …o Complex numbers: 3i+7, 2i+3, …o Characters: “a”, “zZ”, “I hope you are still awake”,…o Constants: pio Logical symbols: TRUE, FALSEo The empty object: NULLo Missing value: NAo Infinity: Info Some others

Common Data StructuresPrimitive data objects

Page 24: Jack Chen

24

Primitive operators arithmetic: +, -, *, / modular: %% matrix multiply: %*% power: ^ logical and/or: &, | relation: <, <=, >, >=, ==, != assignment: =, <-

Common Data StructuresPrimitive operators

Page 25: Jack Chen

25

R function calls have the form: functionName(arg1, arg2, …)

Primitive functions square-root: sqrt(arg) exponential: exp(arg) natural log: log(arg) length of object: length(arg) sum of elements in object: sum(obj) concatenate objects: c(arg1, arg2, …) round down to nearest integer: floor(arg) round up to nearest integer: ceiling(arg) many many others

Common Data StructuresPrimitive functions

Page 26: Jack Chen

26

Examples of valid expressions 1 “a” ‘a’ 1 & TRUE TRUE == FALSE TRUE != FALSE 2 > 3 1 + 2 + 3 + 4 2^3 a = 4; b = 2^a log(37)

Common Data StructuresSimple valid expressions

Page 27: Jack Chen

27

Examples of invalid expressions lala # variable not assigned sqrt(25, 4) # too many arguments log(1 2) # invalid argument 1 = “a” # cannot assign value to primitive numeric TRUE = 3 # cannot assign value to primitive logical

Common Data StructuresSimple invalid expressions

Page 28: Jack Chen

28

Vectors

o R vectors are column vectors, even though they are displayed horizontally in R

o c(object1, object2, …, objectN)

o c stands for: concatenate object1, object2, …, objectN

Common Data Structures and Constructsvectors

Page 29: Jack Chen

29

Examples of vectors:

o c(1, 2, 3, 4) # numeric vector, (1, 2, 3, 4)o c(1:4) # same as aboveo c(1, “a”) # mixture of object typeso c(c(1:3),c(7:10)) # (1, 2, 3, 7, 8, 9, 10)o c(TRUE, FALSE) # logical vector

Common Data Structures and Constructsvectors

Page 30: Jack Chen

30

Other ways to form vectors:

o seq(start, end, by increment) seq(1, 10, 1) # equivalent to c(1:10) seq(10, 1, -1) # equivalent to c(10:1)

o rep(object, repeat) rep(1, 10) # a vector of 10 1’s rep(c(1, 2), 10) # a vector of 1 2 1 2 …

Common Data Structures and Constructsvectors

Page 31: Jack Chen

31

Accessing vector elements

o vector[start index:end index] v = c(1, 2, 3, 4) # assigns v c(1, 2, 3, 4)[1] # returns 1 c(1, 2, 3, 4)[2:4] # returns (2, 3, 4) c(1, 2, 3, 4)[-1] # removes 1st element, returns (2, 3, 4) c(1, 2, 3, 4)[c(1, 3)] # returns (1, 3)

Common Data Structures and Constructsvectors

Page 32: Jack Chen

32

Matrices

o R matrices are objects internally represented as vectors, with 2 additional attributes: number of rows number of columns

o matrix(c(object1, object2, …, objectN), nrow = I, ncol = J)

Common Data Structures and Constructsmatrices

Page 33: Jack Chen

33

Examples of matrices:o matrix(c(1:12), nrow=4, ncol=3)

o matrix(c(1:12), 4, 3) # same as aboveo matrix(c(1:12), nrow=4) # same as aboveo matrix(c(1:12), ncol=3) # same as

aboveo matrix(c(1:12), 4, 2) # invalid

Other ways to form matrices:o diag(1, 10) # 10x10 identity matrixo diag(“a”, 10) # 10x10 matrix with diagonal of “a”o diag(c(1:10), 10) # 10x10 matrix with diagonal

# entries 1, 2, …, 10

Common Data Structures and Constructsmatrices

Page 34: Jack Chen

34

Accessing matrix elements

o matrix[(accessing row vectors), (accessing column vectors)] A = matrix(c(1:9), 3, 3) # assign matrix to variable name A A[1, 1] # returns 1st row 1st element A[1, ] # returns row 1 A[, 1] # returns column 1 A[, 1:2] # returns column 1, 2 A[1:5] # returns (1, 2, 3, 4, 5)

Common Data Structures and Constructsmatrices

Page 35: Jack Chen

35

Matrix manipulationo Adding a row

rbind(matrix object, vector object)o Adding a column

cbind(matrix object, vector object)

o Examples: A = matrix(c(1:9), 3 , 3) cbind(matrix, c(10:12)) # add (10, 11, 12) as last

# column cbind(A[,1], c(10:12), A[,2:3]) # add (10, 11, 12) as

# 2nd column

Common Data Structures and Constructsmatrices

Page 36: Jack Chen

36

Matrix operationo Matrix operations on matrices A, B of conforming

dimensions Addition: A + B Subtraction: A - B Multiplication: A %*% B Inverse: solve(A) Transpose: t(A) Determinant: det(A)

Common Data Structures and Constructsmatrices

Page 37: Jack Chen

37

Listso Traditionally vectors and matrices contain simple data

objects, mostly primitive data objects. More complex data structures are stored in lists.

o lists contain objects and their assigned names:

o list(name1=object1, name2=object2, …)

Example of a list:o list(foo=“hello”, bar=“world”)

Common Data Structures and Constructslists

Page 38: Jack Chen

38

Accessing elements in a list:o We can reference objects in lists by their names with the

dollar “$” operator: alist = list(Friday=“happy”, Monday=“urrr”) alist$Friday # returns “happy” alist$Monday # returns “urrr”

o If no object in the list contains the name following $, then NULL is returned: alist$Tuesday # returns NULL

o We can also access objects in lists by their index with double bracket [[index]]: alist[[1]] # returns “happy” alist[[2]] # returns “urrr”

Common Data Structures and Constructslists

Page 39: Jack Chen

39

Operating on R objects

o R operations are vector-basedo When the left hand side (LHS) and right hand side (RHS)

of an operator conform, elements on LHS of an operator interact with elements on RHS

o Examples c(1, 2) + c(3, 4) # returns (4, 6) c(1, 2) + c(3, 4, 5, 6) # returns (4, 6, 6, 8)

# (1, 2) is added to (3, 4) and (5, 6) 2^c(1, 2, 3, 4) # returns (2, 4, 8, 16) c(1, 2)^c(1, 2, 3, 4) # returns (1, 4, 1, 16)

Operationsoperating on R objects

Page 40: Jack Chen

40

Operating on R objects

o Most of the built-in R objects can report their dimensions.

o Examples: length(c(1:4)) # return 4 length(list(a=1, b=2)) # return 2 length(matrix(c(1:12),4,3)) # return 12 nrow(matrix(c(1:12),4,3)) # returns 4 ncol(matrix(c(1:12),4,3)) # returns 3

Operationsoperating on R objects

Page 41: Jack Chen

Control Blocks

Page 42: Jack Chen

42

Logical Expressionso Logical expression is an expression which evaluates to

TRUE or FALSEo Logical expressions can be formed by the relation

operators equal: == not equal: != less than < greater than > less than or equal to: <= greater than or equal to: >=

o Examples: 0 < 1 # evaluates to TRUE 0 > 1 # evaluates to FALSE “A” == “a” # evaluates to FALSE

Control BlocksLogical expressions

Page 43: Jack Chen

43

if-else statemento if (logical expression) { … } else { … }

{ … } can be a single expression, or a group of expressions and statements, including another if-else statement.

The else part of the statement is optional.

o Examples: if (0 < 1) “true” if (0 > 1) “should not see anything” if (“a” == “A”) { “not equal” } else { “equal” } if (FALSE) { “nothing” } else if (TRUE) { “something” }

Control Blocksif-else statement

Page 44: Jack Chen

44

While loopo while (logical expression) { … }

{ … } (the “body” of the statement) can be a single expression, or a group of expressions.

while statement loops inside { … } until the logical expression evaluates to FALSE.

o Example: while (TRUE) { “never ends!!” } while (FALSE) { “never executed!!” } x=1; while (x==1) { print(x); x=2 } # prints 1, then

# assign x to 2

Control Blockswhile loop

Page 45: Jack Chen

45

For loopo for (index in start:end) { … }

{ … } (the “body” of the statement) can be a single expression, or a group of expressions or statements.

for statement loops in { … } until index exceeds end

o Example: for (i in 1:10) { print(i); }

Control Blocksfor loop

Page 46: Jack Chen

Read/Write Data

Page 47: Jack Chen

47

Read/Write Datao Importing and Exporting data in R is relatively painless.o We can easily import/export files where:

data points are separated by commas data points are separated by tabs or spaces data points are separated by some other delimiter.

Read SAS/SPSS/Stata datao Package “foreign” contains functions that allow you to

read, among others, SAS/SPSS/Stata data. type: install.packages(“foreign”), select a location to download package,

the rest is automatic type: library(foreign) to load the package type: help(package = foreign) to see a list of functions

Read/Write Data

Page 48: Jack Chen

48

Example of reading a file

# reads a file, data points separated by spaces or tabs# assign first column to y, second column to x1, third column to x2file = “http://www-personal.umich.edu/~jktc/R/samples/simple.dat”read.table(file, col.names=c(“y”, “x1”, “x2”))

# specify missing data in fileread.table(file, na.strings= “.”)

# if first row of data file has header (names for each column)file2 = http://www-personal.umich.edu/~jktc/R/samples/simple.header.datread.table(file2, header=TRUE)

# to see more details of read.table functionhelp(read.table)

Read/Write DataReading from a file

Page 49: Jack Chen

49

Example of writing to a file

data = matrix(c(1:9), 3, 3)

# write a space separated file.# assign first column to y, second column to x1# third column to x2

write.table(data, file=“c:/temp/simple.dat”, row.names=FALSE, col.names=c(“y”, “x1”, “x2”), sep=“ “)

# to see more details on write.table functionhelp(write.table)

Read/Write DataWriting to a file

Page 50: Jack Chen

Graphics Samples

Page 51: Jack Chen

51

R has a sophisticated and powerful graphic engine.

We can think of graphic engine as one large object with many attributes representing different pieces to be displayed. The par function allows you to change different attributes of a graph.

Take a look at the different graphic parameters that are available in R:o help(par)

Graphics SamplesBasic graphics

Page 52: Jack Chen

52

Sample plot

Graphic SamplesSample plot

Page 53: Jack Chen

53

Sample plot

Graphic SamplesSample plot

Page 54: Jack Chen

54

Image plot

Graphic SamplesSample graphics

x

y

110

120

135

140

100 200 300 400 500 600 700 800

100

200

300

400

500

600

Maunga Whau Volcano

Page 55: Jack Chen

55

3-D figure

Graphic SamplesSample graphics

Page 56: Jack Chen

R Session:

Function writing, Plots customization, Simulation tips

Page 57: Jack Chen

57

Writing and Debugging Functions

o One of the advantages in R is the ease of creating our own functions. Here’s a very simple function: foo = function() { print(“hello world”); }

o Functions are object themselves.

o We are assigning to variable “foo” a function with no argument.

o When executed: foo(), a message “hello world” is printed to screen.

Functions Writing/DebuggingFunction syntax

Page 58: Jack Chen

58

Sample function writing session:1. Generate population of size 1000 based on the model:

2. Take a random sample of size 100 from population3. Perform two simple linear regressions of y on x:

fit one with intercept fit one without intercept

4. Repeat steps 2 and 3 500 times, store each regression coefficients and plot a histogram of their distribution over the 500 values (ie, distributions of estimated coefficients based on samples).

Functions WritingFunction writing session

Page 59: Jack Chen

59

More on graphics:o To output a plot/graph to a file

pdf(file=filename) # generates pdf file jpeg(file=filename) # generates jpeg file png(file=filename) # generates png file and some others

o When you are done graphing/plotting, run dev.off() to have the image saved in file.

o Without calling the above functions, R generates graphics in a separate window.

o The package “xtable” allows you to output tables into various formats, including html, latex, etc.

Functions WritingFunction writing session

Page 60: Jack Chen

60

Help and administrative functionso help.search(any key word) # help.search(“random

forest”)o help(functionName) # help(glm)o install.packages(“packageName”) # note the quoteo require(packageName)o save(file= , list= )o save.image(file= )

Other common functionso Model fitting

lm, glm, lsfit, anova summary, coef, residuals

o Model adequecy checking av.plot (in car package), influence.measure, colldiag

Functions Writing/DebuggingCommon functions

Page 61: Jack Chen

61

Distributions functionso For normal distribution, R has 4 associated functions:

dnorm: probability density function pnorm: cumulative density function qnorm: inverse of cumulative density function rnorm: point generating function

o Others dpois, ppois, qpois, rpois (poisson) dgeom, pgeom, qgeom, rgeom (geometric) dbinom, pbinom, qbinom, rbinom (binomial) dnbinom, pnbinom, qnbinom, rnbinom (negative binomial) dunif, punif, qunif, runif (uniform) dexp, pexp, qexp, rexp (exponential) dgamma, pgamma, qgamma, rgamma (gamma) dbeta, pbeta, qbeta, rbeta (beta) dchisq, pchisq, qchisq, rchisq (chi-square) df, pf, qf, rf (F distribution) dt, pt, qt, rt (t distribution)

Functions Writing/DebuggingCommon functions

Page 62: Jack Chen

62

Running R commands in batch mode under Unix environment

o Suppose the R commands are in file: cmds.Ro At a command line prompt, type:

R --no-save < cmds.R > output.log 2>&1

o To see the details of command line options: man R

Functions Writing/DebuggingCommon functions

Page 63: Jack Chen

63

References Official R-project website:

o http://www.r-project.orgo On the left hand side, there’s a link “Manuals” under Documentation.

There are quite a few good documentations.o The link “packages” gives a listing of available R packages, and their

documentations.

An excellent link with R examples (including linking R with C/C++ programs):o http://www.math.ncu.edu.tw/~chenwc/R_note/

R for Windows FAQ:o http://lib.stat.cmu.edu/R/CRAN/bin/windows/base/rw-FAQ.html

Google:o Since R is single letter, searching “R” might give you unrelated results. I’ve

used: R+project, R+cran, R+stat, etc…o cran stands for “Complete R Archive Network”

Wrapping upReferences

Page 64: Jack Chen

64

Thank you!

The slides are posted at:http://www-personal.umich.edu/~jktc/R/presentation2009.pptx

The sample R commands in the slides are posted at:http://www-personal.umich.edu/~jktc/R/samples/sample.cmds.2009.R

This is it!Q&A


Recommended