+ All Categories
Home > Documents > dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington...

dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington...

Date post: 30-Jun-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
64
dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 1 / 44
Transcript
Page 1: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

dplyr: manipulating your data

Washington University in St. Louis

September 14, 2016

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 1 / 44

Page 2: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

1 OverviewData manipulation as a part of the data analysis pipelinedplyr: why its awesomedplyr: how do we use it?

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 2 / 44

Page 3: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Some resources for dplyr

Hadley Wickham’s online tutorials:http://www.r-bloggers.com/hadley-wickhams-dplyr-tutorial-at-user-2014-part-1/

Vignettes:https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html

RStudio Blog:http://blog.rstudio.org/2014/01/17/introducing-dplyr/

NOTE: Today’s presentation on dplyr is heavily based on materials fromHadley Wickham’s 2014 tutorial! If you would like more in-depth resourcesabout it, I highly recommend going there first. (In other words, I take nocredit for this presentation – all credits to RStudio and Hadley Wickhamfor creating an awesome tutorial on dplyr)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 3 / 44

Page 4: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

the circle of data processing life

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 4 / 44

Page 5: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

what we will cover in dplyr

World Happiness Data (aka Homework 1.csv)

Single table verbs & grouped summaries

Data pipelines

Joins (two table verbs)

Do

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 5 / 44

Page 6: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

what we will cover in dplyr

World Happiness Data (aka Homework 1.csv)

Single table verbs & grouped summaries

Data pipelines

Joins (two table verbs)

Do

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 5 / 44

Page 7: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

what we will cover in dplyr

World Happiness Data (aka Homework 1.csv)

Single table verbs & grouped summaries

Data pipelines

Joins (two table verbs)

Do

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 5 / 44

Page 8: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

what we will cover in dplyr

World Happiness Data (aka Homework 1.csv)

Single table verbs & grouped summaries

Data pipelines

Joins (two table verbs)

Do

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 5 / 44

Page 9: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

what we will cover in dplyr

World Happiness Data (aka Homework 1.csv)

Single table verbs & grouped summaries

Data pipelines

Joins (two table verbs)

Do

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 5 / 44

Page 10: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

some bad news and some good news

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 6 / 44

Page 11: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

some bad news and some good news

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 6 / 44

Page 12: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Time for Some RStudio!

Now you’ll want to open up Rstudio & read in the ’Homework 1.csv’dataset

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 7 / 44

Page 13: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Installing and Loading dplyr

If you want to install the package via the command line:>> install.packages("dplyr")

Remember that you’ll also want to load the package:>> library(dplyr)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 8 / 44

Page 14: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

World Happiness data

1 Happiness: subjective happiness

2 GDP: Log gross domestic product per capita

3 Support: subjective support from friends

4 Life: healthy life expectancy at birth

5 Freedom: satisfied or dissatisfied with freedom

6 Generosity: donated to charity in past month

7 Corruption: corruption widespread?

Load the all of the data by important the "Homework 1.csv" file.

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 9 / 44

Page 15: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

The 5 most important verbs in dplyr

1 filter: keep rows matching criteria

2 select: pick columns by name

3 arrange: reorder rows

4 mutate: add new variables

5 summarise: reduce variables to values

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 10 / 44

Page 16: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

The 5 most important verbs in dplyr

1 filter: keep rows matching criteria

2 select: pick columns by name

3 arrange: reorder rows

4 mutate: add new variables

5 summarise: reduce variables to values

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 10 / 44

Page 17: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

The 5 most important verbs in dplyr

1 filter: keep rows matching criteria

2 select: pick columns by name

3 arrange: reorder rows

4 mutate: add new variables

5 summarise: reduce variables to values

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 10 / 44

Page 18: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

The 5 most important verbs in dplyr

1 filter: keep rows matching criteria

2 select: pick columns by name

3 arrange: reorder rows

4 mutate: add new variables

5 summarise: reduce variables to values

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 10 / 44

Page 19: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

The 5 most important verbs in dplyr

1 filter: keep rows matching criteria

2 select: pick columns by name

3 arrange: reorder rows

4 mutate: add new variables

5 summarise: reduce variables to values

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 10 / 44

Page 20: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Structure

1 First argument is a data frame

2 Subsequent arguments say what to do with data frame

3 Always return a data frame

4 (Never modify in place, you’ll want to assign the output data frameto an object)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 11 / 44

Page 21: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Structure

1 First argument is a data frame

2 Subsequent arguments say what to do with data frame

3 Always return a data frame

4 (Never modify in place, you’ll want to assign the output data frameto an object)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 11 / 44

Page 22: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Structure

1 First argument is a data frame

2 Subsequent arguments say what to do with data frame

3 Always return a data frame

4 (Never modify in place, you’ll want to assign the output data frameto an object)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 11 / 44

Page 23: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Structure

1 First argument is a data frame

2 Subsequent arguments say what to do with data frame

3 Always return a data frame

4 (Never modify in place, you’ll want to assign the output data frameto an object)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 11 / 44

Page 24: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

A simple example

df <- data.frame(

color = c("blue","black","blue","blue","black"),

value = 1:5)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 12 / 44

Page 25: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Filter the rows that are blue

filter(df, color == "blue")

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 13 / 44

Page 26: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Filter the rows that are blue

filter(df, color == "blue")

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 13 / 44

Page 27: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Filter based on certain values

filter(df, value %in% c(1,4))

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 14 / 44

Page 28: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Filter based on certain values

filter(df, value %in% c(1,4))

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 14 / 44

Page 29: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Some more boolean operators

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 15 / 44

Page 30: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Data Exercise: Find all countries...

1 That begin with J (Japan and Jordan)

2 Classified as World 1

3 With Life between 60 and 70

4 That are both World 1 and Life is between 60 and 70

5 Where Corruption was less than Generosity

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 16 / 44

Page 31: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Data Exercise: Find all countries...

1 That begin with J (Japan and Jordan)

2 Classified as World 1

3 With Life between 60 and 70

4 That are both World 1 and Life is between 60 and 70

5 Where Corruption was less than Generosity

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 16 / 44

Page 32: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Data Exercise: Find all countries...

1 That begin with J (Japan and Jordan)

2 Classified as World 1

3 With Life between 60 and 70

4 That are both World 1 and Life is between 60 and 70

5 Where Corruption was less than Generosity

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 16 / 44

Page 33: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Data Exercise: Find all countries...

1 That begin with J (Japan and Jordan)

2 Classified as World 1

3 With Life between 60 and 70

4 That are both World 1 and Life is between 60 and 70

5 Where Corruption was less than Generosity

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 16 / 44

Page 34: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Data Exercise: Find all countries...

1 That begin with J (Japan and Jordan)

2 Classified as World 1

3 With Life between 60 and 70

4 That are both World 1 and Life is between 60 and 70

5 Where Corruption was less than Generosity

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 16 / 44

Page 35: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Flight Exercise: Find all countries...

1 J < − filter(data, Country == ”Japan” | Country == ”Jordan”)

2 W1 < − filter(data, World==1)

3 life < − filter(Hdata,Life>60 & Life<70)

4 lifeW1 < − filter(data,Life>60 & Life<70 & World == 1)

5 cg < − filter(data, Corruption < Generosity)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 17 / 44

Page 36: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Select

With the ”select()” function, you can pick variables that you are mostinterested. For example, you can treat names of variables like positions.select(df, color)

select(df, -color)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 18 / 44

Page 37: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Your Turn

Read the help for select(). What other ways can you select variables?

Write down (in R) three ways that you can select the two delay variablesin your flight data.

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 19 / 44

Page 38: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

5 ways to select your data

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 20 / 44

Page 39: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Many ways, same delays

select(data, c(Happiness,GDP,Support)

select(data, c(Happiness,GDP,Support)

select(data, starts with("G"))

select(data, contains("G"))

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 21 / 44

Page 40: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Arrange

The purpose of the ”arrange()” function is to change the order of yourrows.arrange(df, color)

arrange(df, desc(color))

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 22 / 44

Page 41: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Your Turn

Order the dataset by departure Happiness and GDP.Which countries were happiest?If we switch the order to GPD and Happiness, what countries are leasthappy?If we order by descending Happiness and GDP, what happens?

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 23 / 44

Page 42: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Arrange Away

arrange(data, Happiness, GDP)

arrange(flights, GPD, Happiness)

arrange(flights, desc(Happiness,GDP))

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 24 / 44

Page 43: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Arrange Away

arrange(data, Happiness, GDP)

arrange(flights, GPD, Happiness)

arrange(flights, desc(Happiness,GDP))

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 24 / 44

Page 44: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Mutate

”mutate()” will allow you to add new variables as a function of existingvariables.mutate(df, double = 2 * value)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 25 / 44

Page 45: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Mutate with compound statements

mutate() will also allow you to perform additional transformations onnewly created variables. Neat!mutate(df, double = 2 * value, quadruble = 2 * double)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 26 / 44

Page 46: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Your Turn

Reverse score the corruption variable (like in the homework and to yourdata frame), a.

Standardize your new corruption variable and add that to your data frame.

(Hint: you may need to use select() or View() to see your new variable

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 27 / 44

Page 47: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Mutate

data < − mutate(data, Corruption r = Corruption*-1)

arrange(flights, desc(Corruption r))

data < − mutate(data, Corruption z =

scale(Corruption r, center = TRUE, scale = TRUE)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 28 / 44

Page 48: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Mutate

data < − mutate(data, Corruption r = Corruption*-1)

arrange(flights, desc(Corruption r))

data < − mutate(data, Corruption z =

scale(Corruption r, center = TRUE, scale = TRUE)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 28 / 44

Page 49: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Summarise

”summarise()” will give you a 1-row dataframe. This is not particularlyuseful.summarise(df, total = sum(value))

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 29 / 44

Page 50: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Group, then summarise

It is much more useful to group your data and then summarise it.by color < − group by(df, color)

summarise(by color, total = sum(value))

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 30 / 44

Page 51: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Grouping the World Happiness data

by world < − group by(data,World)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 31 / 44

Page 52: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Summary functions

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 32 / 44

Page 53: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Your Turn!

Now that you understand the group by() function and summarise()function, how might we want to summarise the GDP by World? (Thereare probably many ways to do this).

What is the average and standard deviation of GDP when you group byWorld?

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 33 / 44

Page 54: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Group by date of departure and summarise delays

by world <- group by(data,World)

GDP by World <- summarise(by world, mean = mean(GDP, na.rm =

TRUE, sd = sd(GDP, na.rm = TRUE)

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 34 / 44

Page 55: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Data pipelines

In real data manipulation, you’re probably not going to just use one verb,but you’re going to use multiple verbs at the same time. This is whereyou’ll want to use data pipelines, which link a bunch of functions intoreadable code. So instead of this...

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 35 / 44

Page 56: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Data pipelines

...you can have this!

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 36 / 44

Page 57: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Joining datasets

Sometimes, you will want to join two separate datasets. Like in theexample below, where you want to join two data frames.

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 37 / 44

Page 58: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Create two dataframes

x < − data.frame(

name = c("John", "Paul", "George", "Ringo", "Stuart", "Pete"),

instrument = c("guitar", "bass", "guitar", "drums", "bass",

"drums"))

y < − data.frame( name = c("John", "Paul", "George", "Ringo",

"Brian"), band = c("TRUE", "TRUE", "TRUE", "TRUE", "FALSE"))

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 38 / 44

Page 59: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

inner join()

Include only rows in both x and y.

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 39 / 44

Page 60: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

left join()

Include all of x, and matching rows in y.

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 40 / 44

Page 61: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

semi join

Include only rows of x that match y

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 41 / 44

Page 62: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

anti join

Include only rows of x that DON’T match y

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 42 / 44

Page 63: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Summary of all the join functions

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 43 / 44

Page 64: dplyr: manipulating your data - WordPress.com · dplyr: manipulating your data Washington University in St. Louis September 14, 2016 (Washington University in St. Louis) dplyr: manipulating

Do function

In the case where none of these functions can do what you want to do tomanipulate the data, you can always use the do() function. It is slower,but more general purpose, and is similar to ddply() and dlply, if you haveused those functions.

(Washington University in St. Louis) dplyr: manipulating your data September 14, 2016 44 / 44


Recommended