+ All Categories
Home > Documents > o v e r v i e w U S C e n s u s d a ta : a nU S C e n s u s d a ta : a n o v e r v i e w ANALYZING...

o v e r v i e w U S C e n s u s d a ta : a nU S C e n s u s d a ta : a n o v e r v i e w ANALYZING...

Date post: 22-Mar-2020
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
27
DataCamp Analyzing US Census Data in R US Census data: an overview ANALYZING US CENSUS DATA IN R Kyle Walker Instructor
Transcript

DataCamp Analyzing US Census Data in R

US Census data: anoverview

ANALYZING US CENSUS DATA IN R

Kyle WalkerInstructor

DataCamp Analyzing US Census Data in R

Course overview

What you'll learn:

How to acquire US Census data with the tidycensus R package

How to wrangle US Census data with tidyverse tools

How to use the R tigris package to acquire US Census Bureau boundary data

How to visualize and map US Census Bureau data in R with ggplot2

DataCamp Analyzing US Census Data in R

About your instructor

Fields: spatial demography & spatial data science

R developer: tidycensus, tigris, & idbr packages

DataCamp Analyzing US Census Data in R

US Census Bureau Data

DataCamp Analyzing US Census Data in R

The US Census Bureau API

To get started using US Census data in R,

Example key: "rw6pozt48ur2ugc8kg69x5phdrtnuhb2cb1subd6"

sign up for a Census API keylibrary(tidycensus)

census_api_key("YOUR KEY GOES HERE", install = TRUE)

DataCamp Analyzing US Census Data in R

Using decennial Census data with tidycensusstate_pop <- get_decennial(geography = "state",

variables = "P001001")

head(state_pop)

# A tibble: 6 x 4

GEOID NAME variable value

<chr> <chr> <chr> <dbl>

1 01 Alabama P001001 4779736

2 02 Alaska P001001 710231

3 04 Arizona P001001 6392017

4 05 Arkansas P001001 2915918

5 06 California P001001 37253956

6 08 Colorado P001001 5029196

DataCamp Analyzing US Census Data in R

Using ACS data with tidycensusstate_income <- get_acs(geography = "state",

variables = "B19013_001")

head(state_income)

# A tibble: 6 x 5

GEOID NAME variable estimate moe

<chr> <chr> <chr> <dbl> <dbl>

1 01 Alabama B19013_001 44758 314

2 02 Alaska B19013_001 74444 809

3 04 Arizona B19013_001 51340 231

4 05 Arkansas B19013_001 42336 234

5 06 California B19013_001 63783 188

6 08 Colorado B19013_001 62520 287

DataCamp Analyzing US Census Data in R

Let's get started!

ANALYZING US CENSUS DATA IN R

DataCamp Analyzing US Census Data in R

Basic tidycensusfunctionality

ANALYZING US CENSUS DATA IN R

Kyle WalkerInstructor

DataCamp Analyzing US Census Data in R

Geography in tidycensus

Legal entities: geography = "county"

Statistical entities: geography = "tract"

Available geographies

DataCamp Analyzing US Census Data in R

Geography and variables in tidycensuscounty_income <- get_acs(geography = "county",

variables = "B19013_001")

county_income

# A tibble: 3,220 x 5

GEOID NAME variable estimate moe

<chr> <chr> <chr> <dbl> <dbl>

1 01001 Autauga County, Alabama B19013_001 53099 2631

2 01003 Baldwin County, Alabama B19013_001 51365 991

3 01005 Barbour County, Alabama B19013_001 33956 2655

4 01007 Bibb County, Alabama B19013_001 39776 3306

5 01009 Blount County, Alabama B19013_001 46212 2443

6 01011 Bullock County, Alabama B19013_001 29335 5435

7 01013 Butler County, Alabama B19013_001 34315 2904

8 01015 Calhoun County, Alabama B19013_001 41954 1381

9 01017 Chambers County, Alabama B19013_001 36027 1870

10 01019 Cherokee County, Alabama B19013_001 38925 2598

# ... with 3,210 more rows

DataCamp Analyzing US Census Data in R

Geographic subsets in tidycensustexas_income <- get_acs(geography = "county",

variables = c(hhincome = "B19013_001"),

state = "TX")

texas_income

# A tibble: 254 x 5

GEOID NAME variable estimate moe

<chr> <chr> <chr> <dbl> <dbl>

1 48001 Anderson County, Texas hhincome 42146 2539

2 48003 Andrews County, Texas hhincome 70121 7053

3 48005 Angelina County, Texas hhincome 44185 2107

4 48007 Aransas County, Texas hhincome 44851 4261

5 48009 Archer County, Texas hhincome 62407 5368

6 48011 Armstrong County, Texas hhincome 65000 9415

7 48013 Atascosa County, Texas hhincome 53181 4114

8 48015 Austin County, Texas hhincome 56681 4903

9 48017 Bailey County, Texas hhincome 40589 8438

10 48019 Bandera County, Texas hhincome 55434 4503

# ... with 244 more rows

DataCamp Analyzing US Census Data in R

Wide data with tidycensusget_acs(geography = "county",

variables = c(hhincome = "B19013_001",

medage = "B01002_001"),

state = "TX",

output = "wide")

# A tibble: 254 x 6

GEOID NAME hhincomeE hhincomeM medageE medageM

<chr> <chr> <dbl> <dbl> <dbl> <dbl>

1 48001 Anderson County, Texas 42146 2539 38.9 0.5

2 48003 Andrews County, Texas 70121 7053 31.2 0.8

3 48005 Angelina County, Texas 44185 2107 36.7 0.3

4 48007 Aransas County, Texas 44851 4261 50.7 1.1

5 48009 Archer County, Texas 62407 5368 44.1 0.7

6 48011 Armstrong County, Texas 65000 9415 45.9 2.8

7 48013 Atascosa County, Texas 53181 4114 35.4 0.2

8 48015 Austin County, Texas 56681 4903 40.8 0.4

9 48017 Bailey County, Texas 40589 8438 34.4 1.1

10 48019 Bandera County, Texas 55434 4503 51.3 0.9

# ... with 244 more rows

DataCamp Analyzing US Census Data in R

Let's practice!

ANALYZING US CENSUS DATA IN R

DataCamp Analyzing US Census Data in R

Searching for data withtidycensus

ANALYZING US CENSUS DATA IN R

Kyle WalkerInstructor

DataCamp Analyzing US Census Data in R

Searching for Census variables

To find Census variable IDs, use:

Online resources like

Built-in variable searching in tidycensus

Census Reporter

DataCamp Analyzing US Census Data in R

Choosing a dataset to searchv16 <- load_variables(year = 2016,

dataset = "acs5",

cache = TRUE)

v16

# A tibble: 22,815 x 3

name label concept

<chr> <chr> <chr>

1 B00001_001 Estimate!!Total UNWEIGHTED...

2 B00002_001 Estimate!!Total UNWEIGHTED...

3 B01001_001 Estimate!!Total SEX BY AGE

4 B01001_002 Estimate!!Total!!Male SEX BY AGE

5 B01001_003 Estimate!!Total!!Male!!Under 5 years SEX BY AGE

6 B01001_004 Estimate!!Total!!Male!!5 to 9 years SEX BY AGE

7 B01001_005 Estimate!!Total!!Male!!10 to 14 years SEX BY AGE

8 B01001_006 Estimate!!Total!!Male!!15 to 17 years SEX BY AGE

9 B01001_007 Estimate!!Total!!Male!!18 and 19 years SEX BY AGE

10 B01001_008 Estimate!!Total!!Male!!20 years SEX BY AGE

# ... with 22,805 more rows

DataCamp Analyzing US Census Data in R

Filtering a variables datasetlibrary(tidyverse)

B19001 <- filter(v16, str_detect(name, "B19001"))

B19001

# A tibble: 170 x 3

name label concept

<chr> <chr> <chr>

1 B19001_001E Estimate!!Total HOUSEHOLD INCOME…

2 B19001_002E ...Less than $10,000 HOUSEHOLD INCOME…

3 B19001_003E ...$10,000 to $14,999 HOUSEHOLD INCOME…

4 B19001_004E ...$15,000 to $19,999 HOUSEHOLD INCOME…

5 B19001_005E ...$20,000 to $24,999 HOUSEHOLD INCOME…

6 B19001_006E ...$25,000 to $29,999 HOUSEHOLD INCOME…

7 B19001_007E ...$30,000 to $34,999 HOUSEHOLD INCOME…

8 B19001_008E ...$35,000 to $39,999 HOUSEHOLD INCOME…

9 B19001_009E ...$40,000 to $44,999 HOUSEHOLD INCOME…

10 B19001_010E ...$45,000 to $49,999 HOUSEHOLD INCOME…

# ... with 160 more rows

DataCamp Analyzing US Census Data in R

ACS variable structure

Anatomy of an ACS variable B19001_002E:

B: refers to base table. Other prefixes: C, DP, S.

19001: the table ID

002: the variable code within the table

E: refers to estimate.

optional in tidycensus functions, which return both E and M for each variable.

DataCamp Analyzing US Census Data in R

Let's practice!

ANALYZING US CENSUS DATA IN R

DataCamp Analyzing US Census Data in R

Visualizing Census datawith ggplot2

ANALYZING US CENSUS DATA IN R

Kyle WalkerInstructor

DataCamp Analyzing US Census Data in R

ggplot2: a layered grammar of graphics in R

DataCamp Analyzing US Census Data in R

Example: plotting income by statelibrary(tidycensus)

library(tidyverse)

ne_income <- get_acs(geography = "state",

variables = "B19013_001",

survey = "acs1",

state = c("ME", "NH", "VT", "MA",

"RI", "CT", "NY"))

ggplot(ne_income, aes(x = estimate, y = NAME)) +

geom_point()

DataCamp Analyzing US Census Data in R

DataCamp Analyzing US Census Data in R

Customizing ggplot2 graphics of ACS dataggplot(ne_income,

aes(x = estimate,

y = reorder(NAME, estimate))) +

geom_point(color = "navy", size = 4) +

scale_x_continuous(labels = scales::dollar) +

theme_minimal(base_size = 14) +

labs(x = "2016 ACS estimate",

y = "",

title = "Median household income by state")

DataCamp Analyzing US Census Data in R

DataCamp Analyzing US Census Data in R

Let's practice!

ANALYZING US CENSUS DATA IN R


Recommended