of 45
8/10/2019 R Course 2014: Lecture 7
1/45
Lecture 7:Merges and functions
Ben Fanson
Simeon Lisovski
8/10/2019 R Course 2014: Lecture 7
2/45
Lecture Outline1) concatenating data.frames
2) merges/joins
3) functions
8/10/2019 R Course 2014: Lecture 7
3/45
Quick review of last we
1) if-then...if( trt == 'a') {
print('yes')
}else{
print('no') }
8/10/2019 R Course 2014: Lecture 7
4/45
Quick review of last we
2) for loops...
for( trt in c('a','b','c') ) {
print(trt)
}
8/10/2019 R Course 2014: Lecture 7
5/45
AppendsBird_id treatment growth_rate
1 t1 12.3
2 t2 10.3
3 t3 14.5
Bird_id treatment growth_ra
4 t1 14.3
5 t2 9.3
6 t3 15.6
8/10/2019 R Course 2014: Lecture 7
6/45
Bird_id treatment growth_rate
1 t1 12.3
2 t2 10.3
3 t3 14.5
Bird_id treatment growth_rate
4 t1 14.3
5 t2 9.3
6 t3 15.6
Bird_id treatment growth_rate
1 t1 12.3
2 t2 10.3
3 t3 14.5
4 t1 14.3
5 t2 9.3
6 t3 15.6
Appends
8/10/2019 R Course 2014: Lecture 7
7/45
Bird_id lifespan
1 45
2 34
3 40
Bird_id growth_rate
1 14.3
2 9.3
3 15.6
Bird_id lifespan growth_rate
1 45 14.3
2 34 9.3
3 40 15.6
Merges (aka joins)unique identifier
8/10/2019 R Course 2014: Lecture 7
8/45
treatment
t1
t1
growth_rate
12.3
14.3
treatment
t2
t2
growth_rate
10.3
9.3
treatment
t3
t3
growth_rate
14.5
15.6
append: rbind()
= ds1
= ds2
= ds3
rbind(ds1, ds2treatment
t1
t1
t2
t2
t3
t3
growt
1
14
1
9
14
1
8/10/2019 R Course 2014: Lecture 7
9/45
treatment
t1
t1
growth_rate
12.3
14.3
treatment
t2
t2
growth_rate
10.3
9.3
rbind.fill()
= ds1
= ds2
rbind.fill(ds1treatment
t1
t1
t2
t2
growth
12.
14.
10.
9.3comments
good
delete
rbind.fill() is in dplyr [technically plyr package]
8/10/2019 R Course 2014: Lecture 7
10/45
combine columns: cbin
= ds1
= ds2
cbind(ds1, treatment
t1
t2
t3
t1t2
t3
growt
1
1
14
149
1
treatment
t1
t2t3
t1
t2
t3
growth_rate12.3
10.3
14.5
14.3
9.3
15.6
T f C M
8/10/2019 R Course 2014: Lecture 7
11/45
Types of Common Mergesjoins)
Inner Join Left Outer Join Full Oute
Method:
One-to-One, One-to-Many, or Many-to-Many
ds1 ds2 ds1 ds2 ds1 d
8/10/2019 R Course 2014: Lecture 7
12/45
id var2 var3
1 a b
2 a b
3 a b
id var4 var5
1 c d
2 c d
3 c d
jargon: left and right datase
Left Right
left is called 'x' in R right is called 'y' in R
8/10/2019 R Course 2014: Lecture 7
13/45
Inner Joins
Bird_id lifespan
1 45
2 34
3 40
4 50
Bird_id growth_rate
1 14.3
2 9.3
5 12.3
merge( left, right, by='Bi
left
right
Bird_id lifespan growth_r
1 45 14.3
2 34 9.3
8/10/2019 R Course 2014: Lecture 7
14/45
Inner Joins
Bird_id trt lifespan
1 A 45
2 A 34
3 B 40
4 B 50
Bird_id trt growth_rate
1 A 14.3
2 A 9.3
5 B 12.3
merge( left, right by=c('B
left
right
Bird_id trt lifespan
1 A 45
2 A 34
8/10/2019 R Course 2014: Lecture 7
15/45
left outer join
Bird_id lifespan
1 45
2 34
3 40
4 50Bird_id lifespan growth_ra
1 45 14.3
2 34 9.3
3 40 NA
4 50 NA
merge(left,right, by='Bird
left
right
Bird_id growth_rate
1 14.3
2 9.3
5 12.3
8/10/2019 R Course 2014: Lecture 7
16/45
full outer join
Bird_id lifespan
1 45
2 34
3 40
4 50Bird_id lifespan growth_ra
1 45 14.3
2 34 9.3
3 40 NA
4 50 NA
5 NA 12.3
merge(left,right, by='Bird
left
right
Bird_id growth_rate
1 14.3
2 9.3
5 12.3
8/10/2019 R Course 2014: Lecture 7
17/45
id var2 var3
1 a b
2 a b
3 a b
id var4 var5
1 c d
2 c d
3 c d
One-to-One Merge
left right
8/10/2019 R Course 2014: Lecture 7
18/45
One-to-Many Merge
id trt value
1 t1 123
1 t2 32
2 t1 35
3 t1 34
3 t2 12
3 t3 10
id age
1 11
2 9
3 4
leftright
8/10/2019 R Course 2014: Lecture 7
19/45
One-to-Many Merge
id trt value
1 t1 123
1 t2 32
2 t1 35
3 t1 34
3 t2 12
3 t3 10
id age
1 11
2 9
3 4
leftright
8/10/2019 R Course 2014: Lecture 7
20/45
One-to-Many Merge
id trt value
1 t1 123
1 t2 32
2 t1 35
3 t1 34
3 t2 12
3 t3 10
id age
1 11
2 9
3 4
leftright
8/10/2019 R Course 2014: Lecture 7
21/45
Many-to-Many Merge
id trt value
1 t1 123
1 t2 32
2 t1 35
2 t2 23
id age
1 9
1 11
2 42 5
left right
8/10/2019 R Course 2014: Lecture 7
22/45
Many-to-Many Merge
id trt value
1 t1 123
1 t2 32
2 t1 35
2 t2 23
id age
1 9
1 11
2 42 5
left right
8/10/2019 R Course 2014: Lecture 7
23/45
Many-to-Many Merge
id trt value
1 t1 123
1 t2 32
2 t1 35
2 t2 23
id age
1 9
1 11
2 42 5
left right
Wh d it tt t thi k b
8/10/2019 R Course 2014: Lecture 7
24/45
Why does it matter to think aboto-one, one-to-many, ....?
1) Merges can indicate that something is not quite right in yo
datasets
2) For instance,...
d li t i d t
8/10/2019 R Course 2014: Lecture 7
25/45
Bird_id lifespan
1 45
2 343 40
Bird_id growth_rate
1 14.3
1 14.32 9.3
3 15.6
Bird_id lifespan growth_rate
1 45 14.3
1 45 14.3
2 34 9.3
3 40 15.6
e.g. duplicates in a data
rule for inner join one-to-one: nrow(ds3) min( nrow(ds1), nro
nrow(ds1)=3 nrow
nrow(ds3)=4
merge(ds1,ds2, by='Bird_id')
8/10/2019 R Course 2014: Lecture 7
26/45
1) not using a 'by=' option [best practice is always use or R gu
Common Merge Mistak
8/10/2019 R Course 2014: Lecture 7
27/45
1) not using a 'by=' option [best practice is always use or R gu
2) Duplicates in the 'unique' identifier, leading to a many-to-mmerge when expecting a one-to-many
e.g. which(duplicated(ds$id))
Common Merge Mistak
8/10/2019 R Course 2014: Lecture 7
28/45
1) not using a 'by=' option [best practice is always use or R gu
2) Duplicates in the 'unique' identifier, leading to a many-to-mmerge when expecting a one-to-many
e.g. which(duplicated(ds$id))
3) unique identifiers are not exactly the samee.g. 'Burt' 'burt' [make sure your dataset is clean]
Common Merge Mistak
8/10/2019 R Course 2014: Lecture 7
29/45
1) not using a 'by=' option [best practice is always use or R gu
2) Duplicates in the 'unique' identifier, leading to a many-to-mmerge when expecting a one-to-many
e.g. which(duplicated(ds$id))
3) unique identifiers are not exactly the samee.g. 'Burt' 'burt' [make sure your dataset is clean]
4) failing to check your nrow(output_ds) to see if it is doing wthink
Common Merge Mistak
8/10/2019 R Course 2014: Lecture 7
30/45
Writing functions
8/10/2019 R Course 2014: Lecture 7
31/45
making user-defined functions is a R strength
so far, we have seen lots of pre-defined functions
e.g. mean(), sum(), select(), summarise()
writing your own
ownFunction
8/10/2019 R Course 2014: Lecture 7
32/45
making user-defined functions is a R strength
so far, we have seen lots of pre-defined functions
e.g. mean(), sum(), select(), rnorm()
writing your own
ownFunction
8/10/2019 R Course 2014: Lecture 7
33/45
making user-defined functions is a R strength
so far, we have seen lots of pre-defined functions
e.g. mean(), sum(), select(), rnorm()
writing your own
ownFunction
8/10/2019 R Course 2014: Lecture 7
34/45
multiple arguments [ function(argument1, argument2
printResult
8/10/2019 R Course 2014: Lecture 7
35/45
default arguments
printResult
8/10/2019 R Course 2014: Lecture 7
36/45
default arguments
printResult
8/10/2019 R Course 2014: Lecture 7
37/45
'...' argument [generic argument]
printResult
8/10/2019 R Course 2014: Lecture 7
38/45
Global vs. Local variables
- any object created outside a function is global
- any object created within a function is localand will bedeleted after the function is run
Functions
F ti
8/10/2019 R Course 2014: Lecture 7
39/45
Local variables
addAmounts
8/10/2019 R Course 2014: Lecture 7
40/45
Global vs. Local variables
total_amount
8/10/2019 R Course 2014: Lecture 7
41/45
use return() to get a local variable
addAmounts
8/10/2019 R Course 2014: Lecture 7
42/45
Modularization
script 1
script 2
script 3
funcPlotting
funcStats
funcGeneric
.Rprofile
source('funcPlotting')
source('funcStats')
source('funcGeneric')
Next Week
8/10/2019 R Course 2014: Lecture 7
43/45
R plotting
1) overview of plotting in R2) introduction to ggplot [aka grammar of graphics ]
3) Week 9 and 10 will be introduction to base plot (by Sime
Next Week
8/10/2019 R Course 2014: Lecture 7
44/45
Lecture 7: Hands on Sectio
Lecture 7 files
8/10/2019 R Course 2014: Lecture 7
45/45
1) get Lecture7.Rfrom github
2) get all data files in data/lecture7/
3) open up Lecture7.Rin Rcourse_proj.Rpoj
4) start working through the example and then try the exercis
Lecture 7 files