Loops, dplyr, maps - CompanyName.com - Super Sloganhofroe.net/stat480/15-loops.pdf ·...

Post on 30-Jun-2020

2 views 0 download

transcript

Loops, dplyr, mapsstat 480

Heike Hofmann

Outline

• Loops

• review of dplyr

• Maps

• Want to run the same block of code multiple times:

• Loop or iteration

for (i in 1:n) { season <- subset(baseball, id == players[i])

mba[i] <- with(season, mean(h/ab))}

block of commands

Iterations

output

NANANANANANANANANANANA

mba0.301NANANANANANANANANANA

mba <- rep(NA, n)

for (i in 1:n) { seasons <- subset(baseball, id == players[i])

mba[i] <- with(seasons, mean(h/ab))}

i = 1i = 20.3010.182NANANANANANANANANA

... and so on ...

0.3010.1820.2360.2100.2380.2750.0890.1520.1120.2490.158

Your Turn

• Run the iteration to get (a) the life time batting average for each player(b) the life time number of times each player was at bats.

• Make a dataset player.stats from mba, nab and players (use data.frame and cbind)

• Plot nab versus mba.

Other loops• while (condition) {

}

• repeat {

if (cond) break}

block of commands

block of commands

Good Practice

• Use tabulators to structure blocks of statements

• Build complex blocks of codes step by step, i.e. try with single state first, try to generalize

•# write comments!

Why should we not use loops?

• Loops generally highlight a user’s inexperience, b/c most loops can be dealt with better and faster in R’s vector system

• dplyr alternative takes care of all householding chores (like saving vector space beforehand, and binding vectors into a dataframe afterwards)

Some Social Issues

• How many people do you know admit to driving while intoxicated?

• How many people do not use their seat belts?

• How many people did not work out for a single minute in the last month?

• … the BRFSS (behavioral risk factor surveillance system) tries to answer those kind of questions …

Data set: Behavioral Risk Factor Surveillance System (BRFSS)

• largest telephone survey to track health risks: http://www.cdc.gov/brfss/

• For overview, go to:http://apps.nccd.cdc.gov/brfss/

• Visit the above website and try to answer one of the previous questions.

• Report on this - or another surprise finding.

What did you find?

• … the online tool is good, but we can do much better in R …

Report back

Using the Codebook

• Open the codebook in a text editor (any text editor, just double click the file once you have downloaded it from the website)

• Use the ‘Search’ function to navigate in the document …

• What does variable QLREST2 encode?

Review of data aggregation with dplyr

group_by, summarise

Recognize .variable• Use dplyr to compute mean QLREST2 values by

state.

• Summarize each of the variables GENHLTH, AVEDRNK2, and DRNKDRI2 by gender (SEX)

•What is the average weight in the population by state, gender and educational level? What is the standard deviation?

Maps

What is a map?

long

lat

40.5

41.0

41.5

42.0

42.5

43.0

43.5

-96 -95 -94 -93 -92 -91

Set of points specifying latitude and longitude

long

lat

40.5

41.0

41.5

42.0

42.5

43.0

43.5

-96 -95 -94 -93 -92 -91

Polygon: connect dots in correct order

long

lat

30

35

40

-95 -90 -85

What is a map?

long

lat

30

35

40

-95 -90 -85

Polygon: connect only the correct dots

Grouping

• Use parameter group to connect the “right” dots (need to create grouping sometimes)

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

long

lat

30

35

40

45

-120 -110 -100 -90 -80 -70

lat

30

35

40

45

qplot(long, lat, geom="point", data=states)

qplot(long, lat, geom="path", data=states, group=group)

qplot(long, lat, geom="polygon", data=states, group=group, fill=region)

qplot(long, lat, geom="polygon", data=states.map, fill=lat, group=group)

Merging Files

• merge(x, y, ...)

• help(merge)

• need to specify along which (key) variable(s) in x and y records are aligned

Your Turn

• Draw a choropleth map of states showing percentage of households without healthcare coverage (HLTHPLAN == 2)

• Are elderly more affected? Draw choropleth maps of states showing percentage of households without healthcare coverage (HLTHPLAN) by age groups (AGE10 - defined earlier).- what is the group size?