R Programming Taster Session

Post on 25-May-2015

793 views 3 download

Tags:

description

Slides for a research methods class on using R.

transcript

Taster/Skills Set Session

R-Programming

is a lot like Magic

Instead of spells, you have functions.

Muggles

Incapable of magic and hardly aware of it.

• Limited ability to change the environment.

• Limited ability to change the environment.

• Must rely on algorithms developed for them.

• Limited ability to change the environment.

• Must rely on algorithms developed for them.

• Problem-solving constrained by SPSS developers.

• Limited ability to change the environment.

• Must rely on algorithms developed for them.

• Problem-solving constrained by SPSS developers.

• Must pay for using the constrained algorithms.

Most people are muggles.

And that’s okay.

Wizards

• Can use functions made by top statistics researchers or create their own.

• Can use functions made by top statistics researchers or create their own.

• Almost unlimited in their ability to change their environment.

• Can use functions made by top statistics researchers or create their own.

• Almost unlimited in their ability to change their environment.

• Can do things SPSS users cannot even dream of.

• Can use functions made by top statistics researchers or create their own.

• Almost unlimited in their ability to change their environment.

• Can do things SPSS users cannot even dream of.

• Get their powers for free.

Warning!Here’s the small print.

Wizards also...

• Love to stretch their brains

Wizards also...

• Love to stretch their brains

• Have strong sitting muscles

Wizards also...

• Love to stretch their brains

• Have strong sitting muscles

• Put in the effort to learn

Wizards also...

• Love to stretch their brains

• Have strong sitting muscles

• Put in the effort to learn

• Persist with puzzles

Wizards also...

• Love to stretch their brains

• Have strong sitting muscles

• Put in the effort to learn

• Persist with puzzles

• Feel at home with the esoteric and obscure

Wizards also...

Do you stillwant to bea wizard?

Syllabus

History of Magic — Origins of R

Syllabus

History of Magic — Origins of RArithmancy — Learning the system

Syllabus

History of Magic — Origins of RArithmancy — Learning the system

Transfiguration — Working with data

Syllabus

History of Magic — Origins of RArithmancy — Learning the system

Transfiguration — Working with dataDivination — Models and predictions

Syllabus

History of Magic

What is ?

What is ?R is a computer language

used for data manipulation, statistics, and graphics.

Learning any new language is tough.

Grammar, vocabulary, idioms,orthography, a new

world view...

The payoff is a whole new world of possibility.

Advantages Disadvantages

Open source Not user friendly at start

State of the art Minimal GUI

Publication-quality graphics Easy to lose “sense” of data

Reproducible research

Computer intensive analyses

Makes you think

Easy interface with databases

1976 – Bell Labs develops S, a language for data analysis; released commercially as S-plus.

1976 – Bell Labs develops S, a language for data analysis; released commercially as S-plus.

1990s – R written and released as open source by (R)oss Ihaka and (R)obert Gentleman.

1976 – Bell Labs develops S, a language for data analysis; released commercially as S-plus.

1990s – R written and released as open source by (R)oss Ihaka and (R)obert Gentleman.

1997 – The Comprehensive R Archive Network (CRAN) launched.

1976 – Bell Labs develops S, a language for data analysis; released commercially as S-plus.

1990s – R written and released as open source by (R)oss Ihaka and (R)obert Gentleman.

1997 – The Comprehensive R Archive Network (CRAN) launched.

Today – 2781 user-contributes packages for R.

Accio .To download R, go to

http://cran.r-project.org/bin/

Windows Mac Linux

Software Pros Cons

Easy(ish), common in psychology

Limited analytic capability

Easy, common in business

Very limited analytic capability

Elegant matrix support

Expensive, lacks in statistics support

Extensibility, visualization,

programmabilityLearning curve

Software Pros Cons

Easy(ish), common in psychology

Limited analytic capability

Easy, common in business

Very limited analytic capability

Elegant matrix support

Expensive, lacks in statistics support

Extensibility, visualization,

programmabilityLearning curve

Software Pros Cons

Easy(ish), common in psychology

Limited analytic capability

Easy, common in business

Very limited analytic capability

Elegant matrix support

Expensive, lacks in statistics support

Extensibility, visualization,

programmabilityLearning curve

Software Pros Cons

Easy(ish), common in psychology

Limited analytic capability

Easy, common in business

Very limited analytic capability

Elegant matrix support

Expensive, lacks in statistics support

Extensibility, visualization,

programmabilityLearning curve

Software Pros Cons

Easy(ish), common in psychology

Limited analytic capability

Easy, common in business

Very limited analytic capability

Elegant matrix support

Expensive, lacks in statistics support

Extensibility, visualization,

programmabilityLearning curve

data analysis contests

Why ?• EVERYTHING in one framework

‣ base: linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering etc.

‣ packages from Medical Image Analysis to Pharmacokinetics

• CUSTOM functionality

‣ Programming ➞ Automation

Practical Benefits

• Multiple datasets open at once

• Automate away “click-click-click” tasks

• Reproducibility

Why not ?

Why not ?

deducer

Tastersession

Learning• Self-study

‣ Past programming experience recommended

‣ Lots of expert advice available

• Oxford

‣ e.g., Ruth Ripley, Department of Statistics

‣ We’ll scratch the surface today

ArithmancyWorking with R

R SPSS

Multi-dimensional data Rectangular data (“spreadsheet”)

Functions can be modified Proprietary functions

Interactive experience Passive experience

Extensible Cross/up-selling

Open and free Commercial

New Mindset

Getting startedwith

(Not very consoling) R console

Write a script hereand run it

Output appears here.Did you get what you

wanted?

Revise the scriptand run it again

Saved scriptscan be rerun

later

Interactivedata analysis

session

writescript

runscript

Interactivedata analysis

sessionTextmate

http://rstudio.org/

Grammar of Spells

object = function(arguments)

Assignment operator

Guess what this does!

z = read.table(“MyFile.txt”)

Two ways about it=

is the same as

<-

Data Frames

z

You can also use e.g., read.csv() and read.spss() functions.

Accessing data

z[1,]

Read 1st row, all columns.

Accessing data

z[1,3]

Read cell at 1st row, 3rd column.

Accessing data

z[,3]

Read 3rd column.

Accessing data

z[,3:6]

Read columns from 3rd to 6th.

Accessing data

z$avbity

Read 3rd column by name.

Accessing data

z[“avbity”]

Read 3rd column by name.

How about?

How about?

z[1:6,1:3]

SubsetsTask: Make a data set of items that cost less than 2.

Subset functionz.cheap <- subset(z, cost < 2)

Can you make sense of this?

Transfiguration

Practical magicTask: Transform a data set from individual data to pair-wise data.

(A typical tall-to-wide transformation.)

Create a data set “c” in which each row has data from both the male and female in each pair from

data set “p”.

Goal

A pair

How would you do this in SPSS?

• Create an id variable for each pair.

• Create an id variable for each pair.

• Click Data > Restructure.

• Create an id variable for each pair.

• Click Data > Restructure.‣ You want the second option, to "Restructure

selected cases into variables".

• Create an id variable for each pair.

• Click Data > Restructure.‣ You want the second option, to "Restructure

selected cases into variables".

• Move id variable into the “Identifier Variable/s” and click “Next.”

• Create an id variable for each pair.

• Click Data > Restructure.‣ You want the second option, to "Restructure

selected cases into variables".

• Move id variable into the “Identifier Variable/s” and click “Next.”

• Click “Yes” when asked whether you want to sort the data.

• Create an id variable for each pair.

• Click Data > Restructure.‣ You want the second option, to "Restructure

selected cases into variables".

• Move id variable into the “Identifier Variable/s” and click “Next.”

• Click “Yes” when asked whether you want to sort the data.

• For “Order of New Variables,” click “Group by Original Variable” and click “Next.”

The Plan

• Step 1

The Plan

• Step 1‣ Make a variable to identify each pair

The Plan

• Step 1‣ Make a variable to identify each pair

• Step 2

The Plan

• Step 1‣ Make a variable to identify each pair

• Step 2‣ Split the tall data into two parts: one chunk for

men and one chunk for women

The Plan

• Step 1‣ Make a variable to identify each pair

• Step 2‣ Split the tall data into two parts: one chunk for

men and one chunk for women

• Step 3

The Plan

• Step 1‣ Make a variable to identify each pair

• Step 2‣ Split the tall data into two parts: one chunk for

men and one chunk for women

• Step 3‣ Merge the two chunks side by side using the pair

identifier

The Plan

Participantids

Participantids

10/10 = 1

Participantids

10/10 = 111/10 = 1.1

Participantids

10/10 = 111/10 = 1.1

When rounded,both equal 1.

Create pair IDp$pair_id <- round(p$code/10)

Now each member of a pair has a common ID.

Separate gendersmen <- subset(! p,! gender == “Male”)

Separate genderswomen <- subset(! p,! gender == “Female”)

Merge sets

c <- merge(men, women, ! by.x = "pair_id",! by.y = "pair_id")

“x”“y”

Ugly variable names

Rename variablesnames(c) <- gsub(! "x", # find “x”! "m", !! # replace with “m”

! names(c))

Rename variablesnames(c) <- gsub(! "y", # find “y”! "f", !! # replace with “m”

! names(c))

But...Wouldn’t it be useful to have participant age

instead of their birth year?

Do it all over again. Click click click click click click.

Just add a line of code to the top:p$Age = (2011 - p$BirthYear)

Now re-run the script.

Practical magicTask: Extract participants’ written responsesfor statistical analysis in LIWC.

(For analysis, LIWC requires each text response in a separate file.)

Extract each cell to a text file.62 participants, 8 variables = 496 files

Manual labour

Manual labour• Boring

Manual labour• Boring

• Prone to human errors

Manual labour• Boring

• Prone to human errors

• Risk of repetitive strain injury

Manual labour• Boring

• Prone to human errors

• Risk of repetitive strain injury

• You have better things to do

The way

The way• Quick

The way• Quick

• Efficient

The way• Quick

• Efficient

• Repeatable

The Plan

• Step 1

The Plan

• Step 1‣ Load SPSS data into R

The Plan

• Step 1‣ Load SPSS data into R

• Step 2

The Plan

• Step 1‣ Load SPSS data into R

• Step 2‣ Create a function that extracts the cell contents

and writes them to a file based on participant id and variable name

The Plan

• Step 1‣ Load SPSS data into R

• Step 2‣ Create a function that extracts the cell contents

and writes them to a file based on participant id and variable name

• Step 3

The Plan

• Step 1‣ Load SPSS data into R

• Step 2‣ Create a function that extracts the cell contents

and writes them to a file based on participant id and variable name

• Step 3‣ Run the function on the data

The Plan

Load data to R

Load data to Rlibrary(foreign)

Load data to Rlibrary(foreign)

d <- read.spss(! “RESEARCH_DATA_FILE.sav", ! to.data.frame = T)

Function ingredients

Function ingredients• Information to identify the right cell

Function ingredients• Information to identify the right cell‣ Participant id (the right row)

Function ingredients• Information to identify the right cell‣ Participant id (the right row)‣ Variable name (the right column)

Function ingredients• Information to identify the right cell‣ Participant id (the right row)‣ Variable name (the right column)

• A unique file name

Function ingredients• Information to identify the right cell‣ Participant id (the right row)‣ Variable name (the right column)

• A unique file name‣ We’ll just use the above information + “.txt”

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

}

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

} The name of our function.I could have used “Waddiwasi” instead,but I didn’t.

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

}Function to makefunctions

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

} The function requires two thingsto work: the participant id and the name of the variable to extract

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

} Create a new object “data” that contains only the rows from “d” where the Ppno is the same as the id fed into the function.

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

}Create a new object “value” that contains the specified variable from the participant data in text format.

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

}Create a new object “filename” by squishing together the participant id, the variable name, and “.txt”.

The FunctionsaveText <- function(id, variable) {! data = subset(d, d$Ppno == id)! value = as.character(data[variable][1,1])! filename = paste(id, variable, ".txt", sep = "")! writeLines(value, con = filename)

}Save the value to a file (name specified by filename).

Ok, what’s next?

Ok, what’s next?• Since the function writes out the data one

cell at a time (based on two bits of information), we need two lists to automate our work:

Ok, what’s next?• Since the function writes out the data one

cell at a time (based on two bits of information), we need two lists to automate our work:‣ A list of participants

Ok, what’s next?• Since the function writes out the data one

cell at a time (based on two bits of information), we need two lists to automate our work:‣ A list of participants‣ A list of all the variables we need

Get ready to run the functionparticipants = unique(d$Ppno)

variables = c(! "phys_attra", "pers_attra",! "Descr__app", "Comments", ! "Signal_conveyed", "portrayyou", ! "their_signals", "their_portrayal”)

“For each participant, go through thevariables and save the results for each.”

Run, function, run!List of participants

List of variables Our function

Loopty loop

Loopty loopfor (participant in participants) {!

}

Loopty loopfor (participant in participants) {!

}Do this onceper participant(62 times total)

Loopty loopfor (participant in participants) {!

}Do this onceper participant(62 times total)

for (variable in variables) {! ! saveText(participant, variable)! }

Loopty loopfor (participant in participants) {!

}Do this onceper participant(62 times total)

for (variable in variables) {! ! saveText(participant, variable)! }

Do this onceper variable(8 times total)

Loopty loopfor (participant in participants) {! for (variable in variables) {! ! saveText(participant, variable)! }}

Result

Result

496FILE

S

...In a flick ofa wand!

Divination

More or lesseverything.

What can R do for you?

Basic magic

Basic magic• Out of the box, R can do

Basic magic• Out of the box, R can do‣ Linear and nonlinear modeling

Basic magic• Out of the box, R can do‣ Linear and nonlinear modeling‣ Classical statistical tests

Basic magic• Out of the box, R can do‣ Linear and nonlinear modeling‣ Classical statistical tests‣ Time-series analysis

Basic magic• Out of the box, R can do‣ Linear and nonlinear modeling‣ Classical statistical tests‣ Time-series analysis‣ Classification

Basic magic• Out of the box, R can do‣ Linear and nonlinear modeling‣ Classical statistical tests‣ Time-series analysis‣ Classification‣ Clustering

Basic magic• Out of the box, R can do‣ Linear and nonlinear modeling‣ Classical statistical tests‣ Time-series analysis‣ Classification‣ Clustering‣ and many other statistical techniques...

More help

More help• An Introduction to R

More help• An Introduction to R‣ http://cran.r-project.org/doc/manuals/R-

intro.html

More help• An Introduction to R‣ http://cran.r-project.org/doc/manuals/R-

intro.html

• R Starter Kit

More help• An Introduction to R‣ http://cran.r-project.org/doc/manuals/R-

intro.html

• R Starter Kit‣ http://www.ats.ucla.edu/stat/r/sk/

More help• An Introduction to R‣ http://cran.r-project.org/doc/manuals/R-

intro.html

• R Starter Kit‣ http://www.ats.ucla.edu/stat/r/sk/

• R mailing list

More help• An Introduction to R‣ http://cran.r-project.org/doc/manuals/R-

intro.html

• R Starter Kit‣ http://www.ats.ucla.edu/stat/r/sk/

• R mailing list

• Dumbledore’s Ruth Ripley’s classDepartment of Statistics, University of Oxford

More help• An Introduction to R‣ http://cran.r-project.org/doc/manuals/R-

intro.html

• R Starter Kit‣ http://www.ats.ucla.edu/stat/r/sk/

• R mailing list

• Dumbledore’s Ruth Ripley’s classDepartment of Statistics, University of Oxford

‣ http://www.stats.ox.ac.uk/~ruth/

Remember...Without R, it’s only esearch.

Thanks for listening!