+ All Categories
Home > Documents > R Course 2014: Lecture 7

R Course 2014: Lecture 7

Date post: 02-Jun-2018
Category:
Upload: gceid
View: 214 times
Download: 0 times
Share this document with a friend

of 45

Transcript
  • 8/10/2019 R Course 2014: Lecture 7

    1/45

    Lecture 7:Merges and functions

    Ben Fanson

    Simeon Lisovski

  • 8/10/2019 R Course 2014: Lecture 7

    2/45

    Lecture Outline1) concatenating data.frames

    2) merges/joins

    3) functions

  • 8/10/2019 R Course 2014: Lecture 7

    3/45

    Quick review of last we

    1) if-then...if( trt == 'a') {

    print('yes')

    }else{

    print('no') }

  • 8/10/2019 R Course 2014: Lecture 7

    4/45

    Quick review of last we

    2) for loops...

    for( trt in c('a','b','c') ) {

    print(trt)

    }

  • 8/10/2019 R Course 2014: Lecture 7

    5/45

    AppendsBird_id treatment growth_rate

    1 t1 12.3

    2 t2 10.3

    3 t3 14.5

    Bird_id treatment growth_ra

    4 t1 14.3

    5 t2 9.3

    6 t3 15.6

  • 8/10/2019 R Course 2014: Lecture 7

    6/45

    Bird_id treatment growth_rate

    1 t1 12.3

    2 t2 10.3

    3 t3 14.5

    Bird_id treatment growth_rate

    4 t1 14.3

    5 t2 9.3

    6 t3 15.6

    Bird_id treatment growth_rate

    1 t1 12.3

    2 t2 10.3

    3 t3 14.5

    4 t1 14.3

    5 t2 9.3

    6 t3 15.6

    Appends

  • 8/10/2019 R Course 2014: Lecture 7

    7/45

    Bird_id lifespan

    1 45

    2 34

    3 40

    Bird_id growth_rate

    1 14.3

    2 9.3

    3 15.6

    Bird_id lifespan growth_rate

    1 45 14.3

    2 34 9.3

    3 40 15.6

    Merges (aka joins)unique identifier

  • 8/10/2019 R Course 2014: Lecture 7

    8/45

    treatment

    t1

    t1

    growth_rate

    12.3

    14.3

    treatment

    t2

    t2

    growth_rate

    10.3

    9.3

    treatment

    t3

    t3

    growth_rate

    14.5

    15.6

    append: rbind()

    = ds1

    = ds2

    = ds3

    rbind(ds1, ds2treatment

    t1

    t1

    t2

    t2

    t3

    t3

    growt

    1

    14

    1

    9

    14

    1

  • 8/10/2019 R Course 2014: Lecture 7

    9/45

    treatment

    t1

    t1

    growth_rate

    12.3

    14.3

    treatment

    t2

    t2

    growth_rate

    10.3

    9.3

    rbind.fill()

    = ds1

    = ds2

    rbind.fill(ds1treatment

    t1

    t1

    t2

    t2

    growth

    12.

    14.

    10.

    9.3comments

    good

    delete

    rbind.fill() is in dplyr [technically plyr package]

  • 8/10/2019 R Course 2014: Lecture 7

    10/45

    combine columns: cbin

    = ds1

    = ds2

    cbind(ds1, treatment

    t1

    t2

    t3

    t1t2

    t3

    growt

    1

    1

    14

    149

    1

    treatment

    t1

    t2t3

    t1

    t2

    t3

    growth_rate12.3

    10.3

    14.5

    14.3

    9.3

    15.6

    T f C M

  • 8/10/2019 R Course 2014: Lecture 7

    11/45

    Types of Common Mergesjoins)

    Inner Join Left Outer Join Full Oute

    Method:

    One-to-One, One-to-Many, or Many-to-Many

    ds1 ds2 ds1 ds2 ds1 d

  • 8/10/2019 R Course 2014: Lecture 7

    12/45

    id var2 var3

    1 a b

    2 a b

    3 a b

    id var4 var5

    1 c d

    2 c d

    3 c d

    jargon: left and right datase

    Left Right

    left is called 'x' in R right is called 'y' in R

  • 8/10/2019 R Course 2014: Lecture 7

    13/45

    Inner Joins

    Bird_id lifespan

    1 45

    2 34

    3 40

    4 50

    Bird_id growth_rate

    1 14.3

    2 9.3

    5 12.3

    merge( left, right, by='Bi

    left

    right

    Bird_id lifespan growth_r

    1 45 14.3

    2 34 9.3

  • 8/10/2019 R Course 2014: Lecture 7

    14/45

    Inner Joins

    Bird_id trt lifespan

    1 A 45

    2 A 34

    3 B 40

    4 B 50

    Bird_id trt growth_rate

    1 A 14.3

    2 A 9.3

    5 B 12.3

    merge( left, right by=c('B

    left

    right

    Bird_id trt lifespan

    1 A 45

    2 A 34

  • 8/10/2019 R Course 2014: Lecture 7

    15/45

    left outer join

    Bird_id lifespan

    1 45

    2 34

    3 40

    4 50Bird_id lifespan growth_ra

    1 45 14.3

    2 34 9.3

    3 40 NA

    4 50 NA

    merge(left,right, by='Bird

    left

    right

    Bird_id growth_rate

    1 14.3

    2 9.3

    5 12.3

  • 8/10/2019 R Course 2014: Lecture 7

    16/45

    full outer join

    Bird_id lifespan

    1 45

    2 34

    3 40

    4 50Bird_id lifespan growth_ra

    1 45 14.3

    2 34 9.3

    3 40 NA

    4 50 NA

    5 NA 12.3

    merge(left,right, by='Bird

    left

    right

    Bird_id growth_rate

    1 14.3

    2 9.3

    5 12.3

  • 8/10/2019 R Course 2014: Lecture 7

    17/45

    id var2 var3

    1 a b

    2 a b

    3 a b

    id var4 var5

    1 c d

    2 c d

    3 c d

    One-to-One Merge

    left right

  • 8/10/2019 R Course 2014: Lecture 7

    18/45

    One-to-Many Merge

    id trt value

    1 t1 123

    1 t2 32

    2 t1 35

    3 t1 34

    3 t2 12

    3 t3 10

    id age

    1 11

    2 9

    3 4

    leftright

  • 8/10/2019 R Course 2014: Lecture 7

    19/45

    One-to-Many Merge

    id trt value

    1 t1 123

    1 t2 32

    2 t1 35

    3 t1 34

    3 t2 12

    3 t3 10

    id age

    1 11

    2 9

    3 4

    leftright

  • 8/10/2019 R Course 2014: Lecture 7

    20/45

    One-to-Many Merge

    id trt value

    1 t1 123

    1 t2 32

    2 t1 35

    3 t1 34

    3 t2 12

    3 t3 10

    id age

    1 11

    2 9

    3 4

    leftright

  • 8/10/2019 R Course 2014: Lecture 7

    21/45

    Many-to-Many Merge

    id trt value

    1 t1 123

    1 t2 32

    2 t1 35

    2 t2 23

    id age

    1 9

    1 11

    2 42 5

    left right

  • 8/10/2019 R Course 2014: Lecture 7

    22/45

    Many-to-Many Merge

    id trt value

    1 t1 123

    1 t2 32

    2 t1 35

    2 t2 23

    id age

    1 9

    1 11

    2 42 5

    left right

  • 8/10/2019 R Course 2014: Lecture 7

    23/45

    Many-to-Many Merge

    id trt value

    1 t1 123

    1 t2 32

    2 t1 35

    2 t2 23

    id age

    1 9

    1 11

    2 42 5

    left right

    Wh d it tt t thi k b

  • 8/10/2019 R Course 2014: Lecture 7

    24/45

    Why does it matter to think aboto-one, one-to-many, ....?

    1) Merges can indicate that something is not quite right in yo

    datasets

    2) For instance,...

    d li t i d t

  • 8/10/2019 R Course 2014: Lecture 7

    25/45

    Bird_id lifespan

    1 45

    2 343 40

    Bird_id growth_rate

    1 14.3

    1 14.32 9.3

    3 15.6

    Bird_id lifespan growth_rate

    1 45 14.3

    1 45 14.3

    2 34 9.3

    3 40 15.6

    e.g. duplicates in a data

    rule for inner join one-to-one: nrow(ds3) min( nrow(ds1), nro

    nrow(ds1)=3 nrow

    nrow(ds3)=4

    merge(ds1,ds2, by='Bird_id')

  • 8/10/2019 R Course 2014: Lecture 7

    26/45

    1) not using a 'by=' option [best practice is always use or R gu

    Common Merge Mistak

  • 8/10/2019 R Course 2014: Lecture 7

    27/45

    1) not using a 'by=' option [best practice is always use or R gu

    2) Duplicates in the 'unique' identifier, leading to a many-to-mmerge when expecting a one-to-many

    e.g. which(duplicated(ds$id))

    Common Merge Mistak

  • 8/10/2019 R Course 2014: Lecture 7

    28/45

    1) not using a 'by=' option [best practice is always use or R gu

    2) Duplicates in the 'unique' identifier, leading to a many-to-mmerge when expecting a one-to-many

    e.g. which(duplicated(ds$id))

    3) unique identifiers are not exactly the samee.g. 'Burt' 'burt' [make sure your dataset is clean]

    Common Merge Mistak

  • 8/10/2019 R Course 2014: Lecture 7

    29/45

    1) not using a 'by=' option [best practice is always use or R gu

    2) Duplicates in the 'unique' identifier, leading to a many-to-mmerge when expecting a one-to-many

    e.g. which(duplicated(ds$id))

    3) unique identifiers are not exactly the samee.g. 'Burt' 'burt' [make sure your dataset is clean]

    4) failing to check your nrow(output_ds) to see if it is doing wthink

    Common Merge Mistak

  • 8/10/2019 R Course 2014: Lecture 7

    30/45

    Writing functions

  • 8/10/2019 R Course 2014: Lecture 7

    31/45

    making user-defined functions is a R strength

    so far, we have seen lots of pre-defined functions

    e.g. mean(), sum(), select(), summarise()

    writing your own

    ownFunction

  • 8/10/2019 R Course 2014: Lecture 7

    32/45

    making user-defined functions is a R strength

    so far, we have seen lots of pre-defined functions

    e.g. mean(), sum(), select(), rnorm()

    writing your own

    ownFunction

  • 8/10/2019 R Course 2014: Lecture 7

    33/45

    making user-defined functions is a R strength

    so far, we have seen lots of pre-defined functions

    e.g. mean(), sum(), select(), rnorm()

    writing your own

    ownFunction

  • 8/10/2019 R Course 2014: Lecture 7

    34/45

    multiple arguments [ function(argument1, argument2

    printResult

  • 8/10/2019 R Course 2014: Lecture 7

    35/45

    default arguments

    printResult

  • 8/10/2019 R Course 2014: Lecture 7

    36/45

    default arguments

    printResult

  • 8/10/2019 R Course 2014: Lecture 7

    37/45

    '...' argument [generic argument]

    printResult

  • 8/10/2019 R Course 2014: Lecture 7

    38/45

    Global vs. Local variables

    - any object created outside a function is global

    - any object created within a function is localand will bedeleted after the function is run

    Functions

    F ti

  • 8/10/2019 R Course 2014: Lecture 7

    39/45

    Local variables

    addAmounts

  • 8/10/2019 R Course 2014: Lecture 7

    40/45

    Global vs. Local variables

    total_amount

  • 8/10/2019 R Course 2014: Lecture 7

    41/45

    use return() to get a local variable

    addAmounts

  • 8/10/2019 R Course 2014: Lecture 7

    42/45

    Modularization

    script 1

    script 2

    script 3

    funcPlotting

    funcStats

    funcGeneric

    .Rprofile

    source('funcPlotting')

    source('funcStats')

    source('funcGeneric')

    Next Week

  • 8/10/2019 R Course 2014: Lecture 7

    43/45

    R plotting

    1) overview of plotting in R2) introduction to ggplot [aka grammar of graphics ]

    3) Week 9 and 10 will be introduction to base plot (by Sime

    Next Week

  • 8/10/2019 R Course 2014: Lecture 7

    44/45

    Lecture 7: Hands on Sectio

    Lecture 7 files

  • 8/10/2019 R Course 2014: Lecture 7

    45/45

    1) get Lecture7.Rfrom github

    2) get all data files in data/lecture7/

    3) open up Lecture7.Rin Rcourse_proj.Rpoj

    4) start working through the example and then try the exercis

    Lecture 7 files


Recommended