+ All Categories
Home > Documents > Introduction to R. J. Charles Victor – Intro to R Workshop Plan The R interface The Console The...

Introduction to R. J. Charles Victor – Intro to R Workshop Plan The R interface The Console The...

Date post: 29-Dec-2015
Category:
Upload: jordan-carpenter
View: 221 times
Download: 1 times
Share this document with a friend
47
Introduction to R Introduction to R
Transcript

Introduction to RIntroduction to R

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Workshop PlanWorkshop PlanThe R interfaceThe R interface

The ConsoleThe Console The Script EditorThe Script Editor The “Workspace”The “Workspace” R programming rules…R programming rules…

How does R ‘think’ How does R ‘think’ R ObjectsR Objects The data frameThe data frame

Importing DataImporting Data

Data ManipulationData Manipulation

Simple AnalysesSimple Analyses

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

What is R?What is R?

Programming environmentProgramming environment Useful for statistics and powerful graphing Useful for statistics and powerful graphing

capabilitiescapabilitiesBut you will be programming, not clicking and But you will be programming, not clicking and pointingpointing

Free, ‘open’ softwareFree, ‘open’ softwareUsers create programs which are made available Users create programs which are made available to other users via web and installation interfaceto other users via web and installation interface

Based on S, S-Plus programmingBased on S, S-Plus programming

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

First StepFirst Step

Open R…Open R…

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

The R ConsoleThe R ConsoleThe main windowThe main window Commands are written and submittedCommands are written and submitted Log of progress recordedLog of progress recorded Output (except graphs) producedOutput (except graphs) produced Similar to STATA interface and functionSimilar to STATA interface and function

Prompt ‘>’ indicates R is waiting for a commandPrompt ‘>’ indicates R is waiting for a command Try the following:Try the following:

> x <- c(1,2,3,4,5) [ENTER]> x <- c(1,2,3,4,5) [ENTER]> mean(x) [ENTER]> mean(x) [ENTER]

You should find the following resultYou should find the following result[1] 3[1] 3

R is telling you the mean of [1,2,3,4,5] is 3R is telling you the mean of [1,2,3,4,5] is 3

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

The Script EditorThe Script Editor

Accessible from the File menu itemAccessible from the File menu item Used to create a series of commands (ie program) Used to create a series of commands (ie program)

that can be saved and run at a later datethat can be saved and run at a later date Similar to DO editor in STATASimilar to DO editor in STATA Will make SAS and SPSS syntax users more Will make SAS and SPSS syntax users more

comfortablecomfortableWrite commands, highlight and click on submit buttonWrite commands, highlight and click on submit button

Try opening the Script editor (‘New Script’) and Try opening the Script editor (‘New Script’) and repeating the same commands as beforerepeating the same commands as before

X <- c(1,2,3,4,5)X <- c(1,2,3,4,5)

mean(x)mean(x) Now highlight this code and click on the submit buttonNow highlight this code and click on the submit button

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Script EditorScript Editor

Nothing Fancy – but VERY usefulNothing Fancy – but VERY useful

Saves programsSaves programs

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

The WorkspaceThe Workspace

The ‘Workspace’ is the R data and objectsThe ‘Workspace’ is the R data and objects When exiting R, saving the workspace saves When exiting R, saving the workspace saves

your data and workyour data and work

Let’s see our work thus farLet’s see our work thus farType: Type: ls()ls()

What do you see?What do you see?

Try saving your work thus farTry saving your work thus farFile -> Save WorkspaceFile -> Save Workspace

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

R ProgrammingR ProgrammingGeneral ProgrammingGeneral Programming R is generally case-sensitiveR is generally case-sensitive

Character strings must be in quotes (only “ “)Character strings must be in quotes (only “ “)

Hitting ENTER submits a commandHitting ENTER submits a commandIf you want a command to go over more than one line, add a If you want a command to go over more than one line, add a ‘+ then hit enter‘+ then hit enter

Try the following:Try the following:> newy > newy <- c<- c(0,0,1, +(0,0,1, ++ 1,0)+ 1,0)

Use ‘comments’ to identify what you have doneUse ‘comments’ to identify what you have done Comments begin with “#”Comments begin with “#”

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

How does R think?How does R think?

R thinks of data elements as ‘objects’R thinks of data elements as ‘objects’ Objects can be:Objects can be:

Single variablesSingle variablesArrays of variablesArrays of variablesEntire DatasetsEntire DatasetsResults from analyses (if saved as an object)Results from analyses (if saved as an object)

When you save the ‘Workspace’ you save all of When you save the ‘Workspace’ you save all of these objectsthese objects So in a small sense, R works like Excel.So in a small sense, R works like Excel.

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

OK, I don’t understand this Object thing…OK, I don’t understand this Object thing…

For data analysts it is usually easiest to start by equating For data analysts it is usually easiest to start by equating the term ‘object’ to mean ‘variable’ at firstthe term ‘object’ to mean ‘variable’ at first We have already created one variable called ‘x’We have already created one variable called ‘x’ We can create another variable (object) called ‘y’ that We can create another variable (object) called ‘y’ that

has the values (20, 27, 18, 50, 99)has the values (20, 27, 18, 50, 99)> y > y <- c<- c(20,27,18,50,99)(20,27,18,50,99)

To see all of the variables (objects) in memory we can To see all of the variables (objects) in memory we can use the ‘list’ commanduse the ‘list’ command

ls()ls() Or click on MISC -> LIST OBJECTSOr click on MISC -> LIST OBJECTS What do you see?What do you see?

DATA INPUTDATA INPUT

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Creating DataCreating Data

How? No Spreadsheet??How? No Spreadsheet?? Create your ownCreate your own

Class of 5 students, need average test scoreClass of 5 students, need average test score

John SmithJohn Smith 58 M58 M

Jaysharee Singh Jaysharee Singh 82 F82 F

Emily Xu Emily Xu 90 F90 F

Ute VanDroglenUte VanDroglen 65 F65 F

Charles VictorCharles Victor 90 M90 M

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

First Attempt to Enter DataFirst Attempt to Enter DataMany ways to create and edit data in RMany ways to create and edit data in R First create variables (objects)First create variables (objects) Then compile the data set from the variablesThen compile the data set from the variables

Creating variables – 2 main ways Creating variables – 2 main ways Relatively few valuesRelatively few valuesVARIABLENAME VARIABLENAME <- c<- c(VALUE1,VALUE2,VALUE3….)(VALUE1,VALUE2,VALUE3….)

Character values in quotes z <- c(“ABC”,”DEF”)Character values in quotes z <- c(“ABC”,”DEF”)

Many valuesMany valuesVARIABLENAME VARIABLENAME <- scan()<- scan() ENTER ENTERVALUE1 VALUE2 VALUE3 VALUE4 …. VALUE8 ENTERVALUE1 VALUE2 VALUE3 VALUE4 …. VALUE8 ENTERVALUE9 VALUE10 ….. ENTERVALUE9 VALUE10 ….. ENTERENTERENTER>>

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Try entering this data inTry entering this data in Use method 1 for first name and last name and sexUse method 1 for first name and last name and sex Use method 2 for exam markUse method 2 for exam mark

John SmithJohn Smith 5858Jaysharee Singh Jaysharee Singh 8282Emily Xu Emily Xu 9090Ute VanDroglenUte VanDroglen 6565Charles VictorCharles Victor 9090

After you create each variable, look at the variable to After you create each variable, look at the variable to see that it is correct by typing the variable name at the see that it is correct by typing the variable name at the command promptcommand prompt

> firstname> firstname

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

A few notes on entering valuesA few notes on entering values

1) Variable names can contain most special 1) Variable names can contain most special characters including ‘.’characters including ‘.’

2) Missing values should be coded as2) Missing values should be coded as

NANA

3) To create a variable whose values are a 3) To create a variable whose values are a sequential list of numbers, use a colon (:)sequential list of numbers, use a colon (:)

StudentID StudentID <- c<- c(1:5)(1:5)

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Creating the DatasetCreating the DatasetCurrently we just have 5 variables (objects)Currently we just have 5 variables (objects) These objects are independent of each other (ie the first These objects are independent of each other (ie the first

name John is not linked with the last name Smith)name John is not linked with the last name Smith)

To ‘link’ these objects we need to compile these variables To ‘link’ these objects we need to compile these variables together in a dataset which R calls a ‘data frame’together in a dataset which R calls a ‘data frame’ In R a data frame is an object just like a variable, and thus it In R a data frame is an object just like a variable, and thus it

is created in a similar fashionis created in a similar fashion

DATA_NAME DATA_NAME <-<- data.framedata.frame (VARIABLE1,VARIABLE2,VARIABLE3) (VARIABLE1,VARIABLE2,VARIABLE3)

Note: All variables must have the same number of Note: All variables must have the same number of observationsobservations

Now take a look at the data by typing the dataset nameNow take a look at the data by typing the dataset name

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Back to ‘Objects’Back to ‘Objects’

Look at the objects now in memoryLook at the objects now in memory ls() or click MISC -> List all objectls() or click MISC -> List all object

You should see all of the variables + the You should see all of the variables + the datasetdataset

You can now use the dataset similar to how You can now use the dataset similar to how we have used variableswe have used variables

To see a variable, type the variable nameTo see a variable, type the variable name To see the dataset, type the dataset nameTo see the dataset, type the dataset name

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

BUT…BUT…Once attached to a dataset, the variables (Studentid, Once attached to a dataset, the variables (Studentid, firstname, lastname, mark, sex) are different than the firstname, lastname, mark, sex) are different than the ‘objects’ in R’s memory‘objects’ in R’s memory

So we have So we have The object: The object: markmark

The variable The variable mark mark on the on the classclass dataset dataset

You may want to get rid of the ‘objects’ now that you You may want to get rid of the ‘objects’ now that you have compiled them onto the dataset – (any changes have compiled them onto the dataset – (any changes made to the objects, will not be reflected on dataset)made to the objects, will not be reflected on dataset)

rmrm(studentid, firstname, lastname, mark, sex)(studentid, firstname, lastname, mark, sex)

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Importing Existing Data into RImporting Existing Data into R

R has not been very foreign data friendlyR has not been very foreign data friendly But this is changing - rapidlyBut this is changing - rapidly

Optimally datasets need to be in the form of:Optimally datasets need to be in the form of:ASCII textASCII text

Tab delimitedTab delimited

Comma delimitedComma delimited

Best to convert Excel data into one of these Best to convert Excel data into one of these formatsformats

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Importing: ASCII textImporting: ASCII textUse command: read.table Use command: read.table

OBJECT OBJECT <- read.table<- read.table(“C:\\My Document\\FILE.TXT, header=T)(“C:\\My Document\\FILE.TXT, header=T)

Note: Pathways, have to have double slash: \\Note: Pathways, have to have double slash: \\ If variable names are on the first rowIf variable names are on the first row

Use header=T optionUse header=T optionOtherwise variables will be named V1 V2 V3…Otherwise variables will be named V1 V2 V3…

Try to import the heart_rx datasetTry to import the heart_rx dataset If you are unsure of the pathway you can use the command: If you are unsure of the pathway you can use the command:

file.choose() nested in the read.tablefile.choose() nested in the read.tableThis will cause R to bring up a GUI to choose your fileThis will cause R to bring up a GUI to choose your file

OBJECT OBJECT <- read.table<- read.table((file.choosefile.choose(), header=F)(), header=F)

Try to import the heart_rx_noheader dataset this wayTry to import the heart_rx_noheader dataset this way

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Importing: Tab Delimited or Comma Separated or Importing: Tab Delimited or Comma Separated or Database File Database File

Tab DelimitedTab Delimited Use command: read.delim Use command: read.delim

OBJECT OBJECT <- read.delim<- read.delim(“C:\\My Document\\FILE.TXT”, header=T,sep=“\t”)(“C:\\My Document\\FILE.TXT”, header=T,sep=“\t”)

Comma Separated Value (CSV)Comma Separated Value (CSV) Use command: read.csvUse command: read.csv

OBJECT OBJECT <- read.csv<- read.csv(“C:\\My Document\\FILE.CSV”, header=T,sep=“,”)(“C:\\My Document\\FILE.CSV”, header=T,sep=“,”)

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Importing: Access, SPSS, Stata etcImporting: Access, SPSS, Stata etcBest method: 3Best method: 3rdrd party software to convert data to a party software to convert data to a Delimited or CSV fileDelimited or CSV file

DBMS Copy is very popularDBMS Copy is very popular

Stat Transfer is very goodStat Transfer is very good

Some users have createdSome users have created read.spssread.spss read.xport (for SAS files)read.xport (for SAS files) read.dta (for STATA files)read.dta (for STATA files) But these commands need to be downloaded and But these commands need to be downloaded and

installed (more on that later)installed (more on that later)

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Importing: R DatasetImporting: R Dataset

If a workspace has been saved from a previous If a workspace has been saved from a previous session, simply load the workspace by ‘clicking session, simply load the workspace by ‘clicking and pointing’and pointing’

Or use the load commandOr use the load command

loadload(“PATHWAY\\FILENAME.Rdata”)(“PATHWAY\\FILENAME.Rdata”)

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Creating a Dataset from a DatasetCreating a Dataset from a DatasetIf you want to create a copy of a current dataset, this is a If you want to create a copy of a current dataset, this is a simple function in R.simple function in R.

Simply create a new object (ie with a different name) Simply create a new object (ie with a different name) from the existing datasetfrom the existing dataset

NEWDATA NEWDATA <-<- OLDDATA OLDDATA

To create a new dataset from an edited version of an To create a new dataset from an edited version of an old datasetold dataset

NEWDATA <- NEWDATA <- editedit(olddata)(olddata)

This will bring up the data editor (more on this later), and any This will bring up the data editor (more on this later), and any changes will be attributed to NEWDATA, but not to OLDDATAchanges will be attributed to NEWDATA, but not to OLDDATA

DATA MANIPULATIONDATA MANIPULATION

99% of the work99% of the work

(don’t underestimate)(don’t underestimate)

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Data Manipulation: GeneralData Manipulation: GeneralMost of your time Most of your time shouldshould be spent in this phase be spent in this phase R is probably not the ‘best’ packageR is probably not the ‘best’ package

Data manipulation includes (among other things)Data manipulation includes (among other things) Renaming variablesRenaming variables Getting rid of variablesGetting rid of variables Creating variablesCreating variables Changing variables (eg categorising age)Changing variables (eg categorising age) Changing values of specific observations Changing values of specific observations

(eg someone reports age of 180)(eg someone reports age of 180) Getting rid of observationsGetting rid of observations Merging datasetsMerging datasets

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

A couple of things first….A couple of things first….

R has MANY ways of accomplishing similar R has MANY ways of accomplishing similar tasks due to its open software constructiontasks due to its open software construction

When referring to variables on a dataset you When referring to variables on a dataset you must either:must either:

Use: Use: d_named_name$$v_namev_nameOROR

““Attach” the datasetAttach” the dataset AttachAttach((d_named_name))

But attaching the dataset does not allow for But attaching the dataset does not allow for manipulation of dataset variables only the use of manipulation of dataset variables only the use of these variablesthese variables

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

What is he talking about??What is he talking about??Lets create a new dataset with two variables x and yLets create a new dataset with two variables x and y

X will be the numbers 1 to 20X will be the numbers 1 to 20 Y will be 20 random values from a normal distributionY will be 20 random values from a normal distribution

XX <- c<- c(1:20)(1:20)YY <- rnorm<- rnorm(x)(x)TestdataTestdata <- data.frame<- data.frame((x,yx,y))

Remove the x and y objectsRemove the x and y objectsrmrm((x,yx,y))

Print the dataset, and then x and yPrint the dataset, and then x and ytestdatatestdataXXYY

Notice we could not access x and y this way. Try:Notice we could not access x and y this way. Try:TestdataTestdata$$xxTestdataTestdata$$yy

That worked, but is a lot of typing. So we could also:That worked, but is a lot of typing. So we could also:AttachAttach((testdatatestdata))XXYY

That worked too! So attaching a dataset, allows us to access the That worked too! So attaching a dataset, allows us to access the variables on the dataset, without using the $ format – but only for variables on the dataset, without using the $ format – but only for visualizing and analysing, not editing (so I don’t like to do it)visualizing and analysing, not editing (so I don’t like to do it)

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Renaming VariablesRenaming Variables

Occasionally we need to rename a variableOccasionally we need to rename a variable Many waysMany ways

We can edit the data like a spreadsheetWe can edit the data like a spreadsheetFixFix(d_name)(d_name)

Create a copy of Class dataset, and “Fix” itCreate a copy of Class dataset, and “Fix” it

NEWDATA <- NEWDATA <- editedit(d_name)(d_name)

OR We can create a new variableOR We can create a new variabled_name$new_v_name <-d_name$old_v_named_name$new_v_name <-d_name$old_v_name

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Deleting and Creating VariablesDeleting and Creating Variables

To delete a variable set a variable to NULLTo delete a variable set a variable to NULL

d_named_name$$v_namev_name <-<- NULL NULL

To create a variable just set the new variable To create a variable just set the new variable equal to some value – we use a similar construct equal to some value – we use a similar construct as beforeas before d_named_name$$v_namev_name <-<- SOME_VALUE OR SOME_VALUE OR

EXPRESSIONEXPRESSION

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Creating VariablesCreating Variables

Suppose we want a variable identifying the Suppose we want a variable identifying the day the exam was written and a variable day the exam was written and a variable identifying the maximum value for the examidentifying the maximum value for the exam

classclass$$test_daytest_day <- c<- c(“Monday”)(“Monday”)

classclass$$test_maxtest_max <- c<- c(100)(100)

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

We can also create variables based on other We can also create variables based on other variablesvariables Imagine that we now want to calculate the Imagine that we now want to calculate the

students percentage on the examstudents percentage on the exam d_named_name$$newv_namenewv_name = = expressionexpression For example:For example:

classclass$$prctprct <- <- ((classclass$$scorescore // classclass$$test_maxtest_max)*100)*100

Remember rules of BEDMASRemember rules of BEDMAS

Creating VariablesCreating Variables

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

A Note on Mathematic FunctionsA Note on Mathematic Functions ++ = addition= addition -- = subtraction= subtraction ** = multiplication= multiplication // = division= division ( )( ) = brackets= brackets **** = to the exponent= to the exponent abs( abs( x x )) = absolute value of x= absolute value of x int( int( x x )) = integer value of x= integer value of x log( log( xx ) ) = natural log of x (ie Ln to non-math types)= natural log of x (ie Ln to non-math types) log10( log10( x x )) = log base 10 of x (ie Log to non-math = log base 10 of x (ie Log to non-math

types)types) sqrt( sqrt( x x )) = square root of x= square root of x

round( round( xx, , valuevalue)) = round x, to value decimals= round x, to value decimals

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Lets change the existing prct variable into letter Lets change the existing prct variable into letter gradesgrades Map out which letter grades apply to which Map out which letter grades apply to which

percentspercents

Below 50 Below 50 = F= F 50 – 5950 – 59 = D= D 60 – 69 60 – 69 = C= C 70 – 7970 – 79 = B= B 80 – 10080 – 100 = A= A

Changing VariablesChanging Variables

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Two waysTwo ways1) Only for numeric variables1) Only for numeric variables

Using Base RUsing Base R Cut functionCut function

D_nameD_name$$new_v_namenew_v_name <-<-

CutCut(d_name$old_v_name , (d_name$old_v_name ,

breaks = c(breakpoints) OR breaks = #breaks,breaks = c(breakpoints) OR breaks = #breaks,

labels = c(“LABEL1”, “LABEL2”,….) )labels = c(“LABEL1”, “LABEL2”,….) )

EGEG

classclass$$lettergrdlettergrd <- cut<- cut((classclass$$prctprct , breaks = c(-Inf,49,59,60, , breaks = c(-Inf,49,59,60,

79,100), labels = 79,100), labels = c(“F”,”D”,”C”,”B”,”A”) )c(“F”,”D”,”C”,”B”,”A”) )

Changing Variables - RecodingChanging Variables - Recoding

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Recoding variables – Second MethodRecoding variables – Second Method

There is a “RECODE” function, but it has been There is a “RECODE” function, but it has been developed outside of the original Base Rdeveloped outside of the original Base R We can incorporate programs that have been We can incorporate programs that have been

written by other peoplewritten by other people Often these programs are compiled into a Often these programs are compiled into a

group of programs that are used for a similar group of programs that are used for a similar constructconstruct

These groups of programs are called These groups of programs are called “Packages” “Packages”

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Installing a Package Installing a Package (to get a function that you do not have)(to get a function that you do not have)

First, note that you do not have ‘recode’First, note that you do not have ‘recode’ help(recode)help(recode)

Now (after searching google) you find out that a special function Now (after searching google) you find out that a special function called ‘recode’ is available in the package called ‘car’called ‘recode’ is available in the package called ‘car’Click PACKAGES -> INSTALL PACKAGE(S)Click PACKAGES -> INSTALL PACKAGE(S)

R will ask you to set a CRAN Mirror (site from which to download R will ask you to set a CRAN Mirror (site from which to download packages)packages)

Choose CANADA (ON)Choose CANADA (ON) R will now ask which package you want to downloadR will now ask which package you want to download

Choose “CAR”Choose “CAR” R will now download the ‘car’ packageR will now download the ‘car’ package

BUT the car package has just been installed, it has not yet been BUT the car package has just been installed, it has not yet been loadedloadedClick PACKAGES -> LOAD PACKAGE(S)Click PACKAGES -> LOAD PACKAGE(S)

R will ask which package to Load from all that you have installedR will ask which package to Load from all that you have installedChoose “CAR”Choose “CAR”

You can now use the recode functionYou can now use the recode function Type help(recode)Type help(recode)

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Recoding – Second MethodRecoding – Second Method

Now that the ‘CAR’ package is installed, we can Now that the ‘CAR’ package is installed, we can use ‘recodeuse ‘recode

D_nameD_name$$new_v_namenew_v_name <- recode<- recode((d_named_name$$old_v_nameold_v_name, recodes), recodes)

Where recodes can be in form of:Where recodes can be in form of:specific values: “c(99,999) = NA; c(1)=‘Y’ “specific values: “c(99,999) = NA; c(1)=‘Y’ “range of values: “lo:50=‘F’; 51:60=‘D’ “range of values: “lo:50=‘F’; 51:60=‘D’ “

classclass$$lettergrd2lettergrd2 <- recode<- recode((classclass$$prctprct, “lo:50=‘F’; , “lo:50=‘F’; 51:60=‘D’;…..”)51:60=‘D’;…..”)

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Combining Conditional StatementsCombining Conditional Statementsto Change Values within Observationsto Change Values within Observations

Your TA informs you that Jim Smith was sick on for the Monday Your TA informs you that Jim Smith was sick on for the Monday Exam, instead he was given a makeup exam, out of 98Exam, instead he was given a makeup exam, out of 98 To identify observations using conditional statements, we To identify observations using conditional statements, we

use the R function IFELSEuse the R function IFELSE

IFELSEIFELSE(condition/expression, value if true, value if false)(condition/expression, value if true, value if false)

classclass$$testmaxtestmax <- ifelse<- ifelse((classclass$$firstnamefirstname == ‘Jim’ & == ‘Jim’ & classclass$$lastnamelastname == ‘Smith’, 98, == ‘Smith’, 98, classclass$$testmaxtestmax))

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

You are then informed that the twins (Joan and You are then informed that the twins (Joan and John Smith) cheated, you have to give them John Smith) cheated, you have to give them zeros:zeros:

classclass$$scorescore <- ifelse<- ifelse((((classclass$$firstnamefirstname == == ‘Joan’ | ‘Joan’ | classclass$$firstnamefirstname == ‘John’) & == ‘John’) &classclass$$lastnamelastname == ‘Smith’, 0, == ‘Smith’, 0, classclass$$scorescore))

More complex…More complex…

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Logical StatementsLogical Statements

< < = Less than= Less than <= <= = Less than or equal to= Less than or equal to > > = Greater than= Greater than >= >= = Greather than or equal to= Greather than or equal to != != = Not equal to= Not equal to ==== = Equal to= Equal to

& or &&& or && = Intersection boolean operator= Intersection boolean operator | or ||| or || = Union boolean operator= Union boolean operator

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Deleting Observations (or Subsetting)Deleting Observations (or Subsetting)

Suppose we want to look at only the Female studentsSuppose we want to look at only the Female students

We need to either delete the Males or keep the We need to either delete the Males or keep the femalesfemales

Best to create a new dataset with only females than Best to create a new dataset with only females than deleting observations from our original datasetdeleting observations from our original dataset

Many ways – Use subset commandMany ways – Use subset command

New_d_name <- subset(old_d_name, condition, New_d_name <- subset(old_d_name, condition, select=variables wanted)select=variables wanted)

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Females <- subset(class, class$sex == ‘F’)Females <- subset(class, class$sex == ‘F’)

Note, we can also select out certain variables Note, we can also select out certain variables onlyonly

Males <- subset(class, class$sex == ‘M’, Males <- subset(class, class$sex == ‘M’, select=c(firstname,lastname,lettergrd) )select=c(firstname,lastname,lettergrd) )

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Data MergeData Merge

Two important types of mergeTwo important types of merge

ConcatenationConcatenationAdding new observations to a set of old Adding new observations to a set of old observationsobservations

Matched mergeMatched mergeAdding new variables (values) to an existing Adding new variables (values) to an existing dataset with the same observationsdataset with the same observations

(eg we need to add mid-term marks to our exam (eg we need to add mid-term marks to our exam database)database)

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

ConcatenationConcatenation

EasyEasy Use Use rbind rbind function, and add all datasetsfunction, and add all datasets

new_d_name new_d_name <- rbind<- rbind(d_name1, d_name2,…)(d_name1, d_name2,…)

But all datasets must have same number (and But all datasets must have same number (and names) of variables!names) of variables!

J. Charles Victor – Intro to RJ. Charles Victor – Intro to R

Matched MergeMatched Merge

A little more complexA little more complex Use Use mergemerge function function

If there is a common variable on which to merge:If there is a common variable on which to merge:

New_d_name <- merge(d_name1, d_name2, New_d_name <- merge(d_name1, d_name2,

by = “ID”, all=TRUE)by = “ID”, all=TRUE)

If the matching variables has different namesIf the matching variables has different names

New_d_name <- merge(d_name1, d_name2, by.x=“IDX”, New_d_name <- merge(d_name1, d_name2, by.x=“IDX”, by.y=“IDY”,all=TRUE)by.y=“IDY”,all=TRUE)


Recommended