+ All Categories
Home > Documents > An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing...

An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing...

Date post: 28-May-2020
Category:
Upload: others
View: 11 times
Download: 0 times
Share this document with a friend
24
An Introduction to R for Epidemiologists using RStudio indexing Steve Mooney, stealing heavily from C. DiMaggio Department of Epidemiology Columbia University New York, NY 10032 [email protected] An Introduction to R for Epidemiologists using RStudio Indexing in R SER Summer 2014
Transcript
Page 1: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

An Introduction to R for Epidemiologists using RStudioindexing

Steve Mooney, stealing heavily from C. DiMaggio

Department of EpidemiologyColumbia UniversityNew York, NY 10032

[email protected]

An Introduction to R for Epidemiologists using RStudioIndexing in R

SER Summer 2014

Page 2: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Overview

Outline

1 Indexing Overview

2 Indexing Vectors

3 Indexing Matrices & Arrays

4 Indexing Lists

5 Indexing Dataframes

S. Mooney (Columbia University) R intro 2014 2 / 24

Page 3: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Overview

Why Indexing

Indexing is how you refer to data within a data structure

1 To read out values (e.g. to plot)

2 To clean data

3 To format output

S. Mooney (Columbia University) R intro 2014 3 / 24

Page 4: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Vectors

Outline

1 Indexing Overview

2 Indexing Vectors

3 Indexing Matrices & Arrays

4 Indexing Lists

5 Indexing Dataframes

S. Mooney (Columbia University) R intro 2014 4 / 24

Page 5: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Vectors

indexing vectorsmyVector[n]

people <- c("Alice", "Bob", "Charlie", "Danielle", "Eunice")

people[1]

people[4]

people[6]

people[-1]

people[c(2,4)]

people[c(4,2)]

S. Mooney (Columbia University) R intro 2014 5 / 24

Page 6: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Vectors

sorting vectors

sort() rearranges the same vector

x <- c(12, 3, 14, 3, 5, 1)

sort(x)

rev(sort(x))

sort() does not change the vector

sort(x)

x

x <- sort(x)

x

S. Mooney (Columbia University) R intro 2014 6 / 24

Page 7: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Vectors

ordering and ranking vectors

You often want to sort one vector by values in another

order() to rearrange another vector

ages<- c(8, 6, 7, 4, 4)

order(ages)

people <- c("Alice", "Bob", "Charlie", "Danielle", "Eunice")

people[order(ages)]

creates an index of positional integers to rearrange elements of another vector,e.g. people[c(4,5,2,3,1)], 4th element (Danielle) in 1st position, 5th element(Eunice) in 2nd position, 2nd element (Bob) in 3rd position, etc...

rank() doesn’t sort

x <- c(12, 3, 14, 3, 5, 1)

rank(x)

S. Mooney (Columbia University) R intro 2014 7 / 24

Page 8: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Vectors

Using indexing for data cleaningmyVector[n] ¡- new value

people <- c("Alice", "Bob", "Charlie", "Danielle", "Eunice")

people

people[2] <- "Robert"

people

S. Mooney (Columbia University) R intro 2014 8 / 24

Page 9: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Vectors

Modification using complex indices

people <- c("Alice", "Bob", "Charlie", "Danielle", "Eunice")

people

people[2:3] <- c("Robert", "Charles")

people

people[-2] <- c("Alison", "Charles", "David", "Eleanor")

people

S. Mooney (Columbia University) R intro 2014 9 / 24

Page 10: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Vectors

Indexing by logical

people <- c("Alice", "Bob", "Charlie", "Danielle", "Eunice")

which.people <- c(TRUE, FALSE, FALSE, TRUE, TRUE)

people[which.people]

which.people <- people == "Bob"

which.people

people[which.people]

which.people <- people \%in\% c("Bob", "Charlie")

which.people

people[which.people] <- "Not to be named"

people

S. Mooney (Columbia University) R intro 2014 10 / 24

Page 11: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Vectors

missing values

is.na() - returns logical vector of NA positionsuseful for replacing missing values

!is.na() - positions that do not contain NA

x <- c(10, NA, 30)

is.na(x)

x[is.na(x)] <- 999

x

S. Mooney (Columbia University) R intro 2014 11 / 24

Page 12: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Vectors

other unusual values

is.nan() - not a number

is.infinite() - infinite

x <- c(10, 0, 30)

y <- c(0, 0, 2)

z <- x/y

z

is.infinite(z)

is.nan(z)

S. Mooney (Columbia University) R intro 2014 12 / 24

Page 13: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Vectors

Indexing by name

people <- c("Alice", "Bob", "Charlie", "Danielle", "Eunice")

names(people) <- c("President", "Vice-President", "Secretary", "Staff", "Intern")

people

people["President"]

people[c("President", "Intern")]

people[c("President", "Intern")] <- "Vacant"

people

S. Mooney (Columbia University) R intro 2014 13 / 24

Page 14: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Matrices & Arrays

Outline

1 Indexing Overview

2 Indexing Vectors

3 Indexing Matrices & Arrays

4 Indexing Lists

5 Indexing Dataframes

S. Mooney (Columbia University) R intro 2014 14 / 24

Page 15: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Matrices & Arrays

a matrix is a 2-dimensional vector......so index each vector

Index a matrix with matrixname[row, column]

myMatrix<-matrix(c("a","b","c","d"),2,2)

myMatrix

myMatrix[1,1]

myMatrix[1,2]

myMatrix[2,1]

myMatrix[c(TRUE, FALSE),c(TRUE, FALSE)]

S. Mooney (Columbia University) R intro 2014 15 / 24

Page 16: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Matrices & Arrays

Indexing a whole row or columnleave out the row or column

Index a matrix with matrixname[row, column]

myMatrix<-matrix(c("a","b","c","d"),2,2)

myMatrix

myMatrix[1,]

myMatrix[,2]

myMatrix[,2] <- c("e", "f")

S. Mooney (Columbia University) R intro 2014 16 / 24

Page 17: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Matrices & Arrays

Indexing an array

Index a n array with arrayname[row, column, depth]

ugdp.age <- c(8, 98, 5, 115, 22, 76, 16, 69)

ugdp.age <- array(ugdp.age, c(2, 2, 2))

ugdp.age[1,2,1]

S. Mooney (Columbia University) R intro 2014 17 / 24

Page 18: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Lists

Outline

1 Indexing Overview

2 Indexing Vectors

3 Indexing Matrices & Arrays

4 Indexing Lists

5 Indexing Dataframes

S. Mooney (Columbia University) R intro 2014 18 / 24

Page 19: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Lists

a list is a collection of unlike elements

double brackets [[...]] index the list items

object$name if a named list

x <- 1:5

y <- matrix(c("a","c","b","d"), 2,2)

z <- c("Peter", "Paul", "Mary")

mm <- list(x, y, z)

mm[[2]]

mm[[2]][2,2]

nn <- list(numbers=x, twoxtwo=y, names=z)

nn$names

nn$names[2]

S. Mooney (Columbia University) R intro 2014 19 / 24

Page 20: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Dataframes

Outline

1 Indexing Overview

2 Indexing Vectors

3 Indexing Matrices & Arrays

4 Indexing Lists

5 Indexing Dataframes

S. Mooney (Columbia University) R intro 2014 20 / 24

Page 21: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Dataframes

dataframestabular epi data sets

2-dimensional tabular lists with equal-length fieldseach row is a record or observationeach column is a field or variable (usually numeric vector or factors)

”a list that behaves like a matrix”

S. Mooney (Columbia University) R intro 2014 21 / 24

Page 22: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Dataframes

dataframestabular epi data sets

Option 1: index observations or rows or columns like a matrix

titanic<-read.csv(

"http://www.columbia.edu/~sjm2186/SER2014/titanic.csv",

stringsAsFactors=F) #load titanic data

str(titanic)

titanic[1,2]

head(titanic[,2])

S. Mooney (Columbia University) R intro 2014 22 / 24

Page 23: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Dataframes

dataframestabular epi data sets

Option 2: index columns like a list

titanic$sex

head(titanic$name)

mode(titanic$sex)

table(titanic$sex)

table(titanic$sex, titanic$survived)

S. Mooney (Columbia University) R intro 2014 23 / 24

Page 24: An Introduction to R for Epidemiologists using RStudio ...sjm2186/SER2014/indexing.pdf · Indexing Overview Why Indexing Indexing is how you refer to data within a data structure

Indexing Dataframes

dataframestabular epi data sets

Often index like a matrix to subset rows, then like a list to perfromanalyses

men <- titanic[titanic$sex=="male",]

table(men$survived)

S. Mooney (Columbia University) R intro 2014 24 / 24


Recommended