Date post: | 25-May-2015 |
Category: |
Documents |
Upload: | hadley-wickham |
View: | 1,272 times |
Download: | 0 times |
Hadley Wickham
Stat405ddply case study
Wednesday, 7 October 2009
1. Feedback & homework & project
2. Overall goal: dual-sex names vs. errors
3. Selecting smaller subset
4. Classification
5. Individual exploration
Wednesday, 7 October 2009
Iβll try and go slower when writing things on the board. Remind me!
Too much homework? Will try to reduce from now on. This weekβs homework is a bit different.
Feedback
Wednesday, 7 October 2009
Homework
If you need more practice, all function drills, along with answers, are available on line.
Running behind on grading, sorry :(
Common mistakes
Wednesday, 7 October 2009
even <- function(x) { is_even <- x %% 2 == 0 if (is_even) { print("Even!") } else { print("Odd!") }}
# Problems# * does it work with vectors?# * can we easily define odd in terms of even?
Wednesday, 7 October 2009
even <- function(x) { x %% 2 == 0 }
even(1:10)
odd <- function(x) { !even(x)}
# In general, always should return something useful# from functions, rather than printing or plotting
Wednesday, 7 October 2009
area <- function(r) { a <- pi * r ^ 2 a}
# Not necessary!
area <- function(r) { pi * r ^ 2}
Wednesday, 7 October 2009
# Choose from a, b and c with equal probability
x <- runif(1)if (x < 1/3) { "a"} else (x < 2/3) { "b"} else { "c"}
# OR sample(c("a","b","c"), 1)
Wednesday, 7 October 2009
Still working on grading. Will have back to you by next Wednesday (no class on Monday).
Next project due Oct 30.
Basically same as last time, but working with baby names and you need to include an external data source.
Project
Wednesday, 7 October 2009
For names that are used for both boys and girls, how has usage changed?
Can we use names that clearly have the incorrect sex to estimate error rates over time?
Questions
Wednesday, 7 October 2009
Getting started
options(stringsAsFactors = FALSE)library(plyr)library(ggplot2)
bnames <- read.csv("baby-names.csv")
Wednesday, 7 October 2009
First task
Identify a smaller subset of names that been in the top 1000 for both boys and girls. ~7000 names in total, we want to focus on ~100.
In real-life would probably use more, but starting with a subset for easier exploration is still a good idea.
Wednesday, 7 October 2009
First task
Identify a smaller subset of names that been in the top 1000 for both boys and girls. ~7000 names in total, we want to focus on ~100.
In real-life would probably use more, but starting with a subset for easier exploration is still a good idea.
Take two minutes to brainstorm what variables we might to create to do this.
Wednesday, 7 October 2009
Your turnSummarise each name with: the total proportion of boys, the total proportion of girls, the number of years the name was in the top 1000 as a girls name, the number of years the name was in the top 1000 as a boys name
Hint: Start with a single name and figure out how to solve the problem. Hint: Use summarise
Wednesday, 7 October 2009
times <- ddply(bnames, c("name"), summarise, boys = sum(prop[sex == "boy"]), boys_n = sum(sex == "boy"), girls = sum(prop[sex == "girl"]), girls_n = sum(sex == "girl"), .progress = "text")
nrow(times)times <- subset(times, boys_n > 1 & girls_n > 1)
Wednesday, 7 October 2009
boys_n
girls_n
20
40
60
80
100
120
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β ββ
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β β
β
β
βββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β β
β
β
β
β
β
ββ β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
βββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
βββ
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
β
β β
β
β
β
β
β
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
ββ
ββ
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β
β β
β
β
β
β
β
β
β
20 40 60 80 100 120
Wednesday, 7 October 2009
pmin(boys_n, girls_n)
coun
t
0
20
40
60
80
20 40 60 80 100 120
New functions:pmin(a, b)pmax(a,b)
Wednesday, 7 October 2009
qplot(boys_n, girls_n, data = times)
qplot(pmin(boys_n, girls_n), data = times, binwidth = 1)times$both <- with(times, boys_n > 10 & girls_n > 10)
# Still a few too many names. Lets focus on names # that have managed a certain level of popularity.
qplot(pmin(boys, girls), data = subset(times, both), binwidth = 0.01)qplot(pmax(boys, girls), data = subset(times, both), binwidth = 0.1)qplot(boys + girls, data = subset(times, both), binwidth = 0.1)
Wednesday, 7 October 2009
# Now save our selections
both_sexes <- subset(times, both & boys + girls > 0.4)selected_names <- both_sexes$name
selected <- subset(bnames, name %in% selected_names)nrow(selected) / nrow(bnames)
Wednesday, 7 October 2009
Next problem is to classify which names are dual-sex, and which are errors.
To do that, weβll need to calculate yearly summaries for each of those names, and use our knowledge of names to come up with a good classification criterion.
Yearly summaries
Wednesday, 7 October 2009
Your turn
For each name, in each year, figure out the total number of boys and girls.
Think of ways to summarise the difference between the number of boys and girls, and start visualising the data.
Wednesday, 7 October 2009
bysex <- ddply(selected, c("name", "year"), summarise, boys = sum(prop[sex == "boy"]), girls = sum(prop[sex == "girl"]), .progress = "text")
# It's useful to have a symmetric means of comparing # the relative abundance of boys and girls - the log # ratio is good for this.bysex$lratio <- log10(bysex$boys / bysex$girls)bysex$lratio[!is.finite(bysex$lratio)] <- NA
Wednesday, 7 October 2009
year
lratio
β2
β1
0
1
2
1880 1900 1920 1940 1960 1980 2000
Wednesday, 7 October 2009
lratio
reor
der(n
ame,
lrat
io, n
a.rm
= T
)
SusanLindaKarenLisaBarbaraSandraDonnaPatriciaAmandaJenniferNancyMelissaJessicaSharonMichelleBettyMaryDorothyVirginiaHelenMargaretRuthElizabethSarahAnnaAliceMildredEmmaMarieMarthaLillianBerthaClaraGraceMinnieEdnaAnnieKimberlyEdithEthelFlorenceRoseLouiseIreneDorisJuliaFrancesCarolAshleyShirleyWillieJerryRyanJoeLouisAnthonyDanielEricJoshuaJasonFredHenryJackChristopherKevinGeorgeMatthewArthurWalterHaroldKennethBrianMichaelPaulAlbertCharlesFrankJosephJamesHarryRobertJohnDavidDonaldThomasEdwardWilliamRichardLarryMarkRonald
ββββββββββ ββββ ββββββββββββββββββββββββ βββββββ βββββ βββββββββββ βββββββββ β
βββββ βββ ββββββ ββββββββββββββββββββββββββββ ββββ ββ βββββ ββ βββββββ ββββ β
βββββββββ ββββ ββββ ββββββββββββββββββ ββ βββ ββ βββ βββββ ββ ββββββ βββββββββββ ββ ββββ ββββββββββββββ β
ββββ β βββββββββββ ββββββββββββββββ ββββ
ββββββββ ββββ ββββββββββββββ βββ β ββ
βββββ ββββββββββββββββββββββββββββ ββββββββ
β ββ ββββ ββββ ββ ββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββ ββββββ ββ ββββββ ββ ββββββββ ββββ ββ βββ βββββββ βββββββ βββ ββββ βββ βββββ ββ βββββ β
ββ βββββββββββββββββββ ββββ ββ ββββ β βββ βββ βββ βββββ ββ ββββββ βββ βββββ βββ ββ ββββ
ββ ββββ ββ ββββ βββββ βββ ββββββ ββββββββββββββββ ββββββββ ββββββββββββββ ββ β βββ β ββββ βββββ βββββββββ βββ βββββ βββββ ββ ββββββ
ββ βββ β ββ ββββ ββ βββ ββββ βββββββββββ βββββββββββββββββββββββββββββ βββ ββ ββ β βββββ βββββββ β
β ββββββ ββββββ βββ β βββββββ ββ ββ ββββββββββββ βββββββ ββ β ββββ ββββ ββββ ββββββββββ
βββββ β ββ βββ βββ βββββββββββββββ β ββ βββ βββ βββββ β ββ ββββ β β
ββββββββ βββ ββββββ βββββββββββ βββ βββ ββ βββββββββ
ββββββ βββββ ββ ββββ βββββ βββ βββ βββ ββ β ββββββ β ββ βββββ ββ ββ βββ ββ
ββ βββ ββββ ββββ β ββ β ββ β ββ βββββββ ββ βββββββ ββ βββββ βββ βββ
βββ ββ β βββ βββ ββ β ββ ββ ββ ββββ βββββ βββββ β βββββ βββ β ββββ
ββ ββ ββ ββ ββββ ββ ββ βββ ββ ββ ββ ββ βββ βββββββββββββββββββββββββββββββ βββββ
β ββ ββββ βββ βββ ββ ββ βββββ ββ ββ ββ β βββ ββ β ββββββββββ βββββ β
βββ β ββββββ ββββ β ββ ββ ββ ββββ βββ βββββββββββ ββ βββ β ββ βββ ββ βββββββββββββ
βββ βββ ββ βββ ββ βββββ βββ ββ βββββββββββββββββ ββββ βββ ββββ
ββββββββ βββββ βββββββββ ββββ ββ βββββββ βββββββββ β ββ βββ
βββ ββββββββββ ββββββββ ββββββ βββ ββββββββββββββββββββ ββββ βββ βββββββββββββ βββ ββββββββββββββββββ βββββββββββββββββββββββββββββ βββ βββββ
βββββββββ ββββββββββββββββ ββββββββββ βββββ βββ β β ββ βββ ββ ββββ βββ βββββββββββββββββββββββββββββββββββββββββββββββ
βββββ βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββ ββββββ ββ βββββββββββββββββββββββββββ ββ ββββ
ββββββββββββββββ βββββββββββββββββββββββ βββ ββ βββ ββββ ββ βββ βββ βββ βββββββββ βββββββββββββββββββ ββββββββββββββββββββββ βββββββ ββ ββββ ββββ βββββββββββββββββββββββββββββββββββ ββββββββββββββββββ βββββββββββββββββββββββββ ββ β
ββββββββββββββββββ βββ ββββββββββββββββ
βββ βββ β ββ βββββ ββββββββ ββ βββββββ ββββββββ ββββββββ ββββββββββ ββββββββββββββββββ ββ
βββββββββββββββββββββββ β βββββββββββββ
βββ ββ βββββ β ββ βββ β ββ ββββββββββββ ββββββββββββββββββββ βββββ ββββββββββββββββββββββββββββββββββ β
β βββββ ββ ββββββββββββββββββββββ β βββ ββββββββββ ββββββββββββββββββββββββ
βββββββββββββ βββββββββ ββββββββ
ββββββββββββββββ ββββ ββ β βββββββββββββββββ βββββββββββββββββββββββββ β βββββ
βββ βββββ ββββββββ ββ ββ ββββββββββ ββββ ββββββ βββ
βββ βββββ ββββββββ ββββββ ββββ ββββββ ββββββββββ βββββββββββ ββββ ββββββββββββββ βββββββββββββββββββββββ ββββ ββββββ ββ βββ ββββ ββ ββββ βββ βββ ββββββ ββ ββββββββββ ββ ββ βββββββ ββ ββ βββ ββββ ββββββ ββββββ βββ ββ ββββββββββββββββββββββββ ββββ βββββββββββββββββββββββββββ βββ βββ ββ ββ βββ ββββββββββββββββ ββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββ βββββββββββββββ β
ββ ββ βββ ββ βββ ββββ βββ βββ ββ β β ββββββ ββββββ βββββββββββββββββββββββββ βββββββββββββ ββββββββββββββββββ ββββββββββββββββββββββββββ ββ βββββββ βββ βββββ βββ ββββββββββββ ββββββββββββββββββββββ ββ βββββββββ βββββββββββββββββββββββββββββββββββββ βββββββββββ βββββ ββββ β ββ ββ βββββββββββ ββββββββββββββββββββββββββββββ βββ βββββ
ββββββββββ ββ ββ βββββ βββββββ ββ ββββββββ ββββββββββ ββββββββββββββββ βββ βββ ββββ ββ ββββ ββββββββββββββββββ
ββββββ βββ βββββββββββββ βββββββ βββ ββββββββββββ ββββ ββ ββ β ββββ ββ βββ ββ βββββ βββββββββββββββββββββββββββββ βββββββββββ ββ βββββββββββββ βββββββββββββββββ βββββββ
β β βββββββββ ββββββ ββββββββββββββββββ βββββββββββββββββββββββ ββββ ββ βββββββββ
ββββββββββββββ ββββββββββ
β2 β1 0 1 2
Wednesday, 7 October 2009
abs(lratio)
reor
der(n
ame,
lrat
io, n
a.rm
= T
)
SusanLindaKarenLisaBarbaraSandraDonnaPatriciaAmandaJenniferNancyMelissaJessicaSharonMichelleBettyMaryDorothyVirginiaHelenMargaretRuthElizabethSarahAnnaAliceMildredEmmaMarieMarthaLillianBerthaClaraGraceMinnieEdnaAnnieKimberlyEdithEthelFlorenceRoseLouiseIreneDorisJuliaFrancesCarolAshleyShirleyWillieJerryRyanJoeLouisAnthonyDanielEricJoshuaJasonFredHenryJackChristopherKevinGeorgeMatthewArthurWalterHaroldKennethBrianMichaelPaulAlbertCharlesFrankJosephJamesHarryRobertJohnDavidDonaldThomasEdwardWilliamRichardLarryMarkRonald
β ββ β ββ β ββ ββ ββββ βββββββββββ ββββ β β βββ β βββ βββββ ββ β ββββ βββ βββ βββββ βββββ ββββ
β βββββ β βββββ β ββββββββ ββ ββββββββ β ββ β ββββββ ββββ βββββ
β β ββ ββ ββ β β ββββββββ β β β ββ ββ ββββββ
β β ββ βββ ββββββ β β β ββ ββ ββ ββ βββββ β βββββ ββ βββββ β ββββββ βββββ ββ ββ ββ ββ β ββββββββββ βββ
β β β ββββ ββ βββββ β ββββββββββββββββ βββ ββ
βββ ββββββ ββββ βββ β βββ βββ ββ ββ βββββ
ββββββββββ βββββ ββ βββββββββββββ ββ ββββ β β β β β
ββ ββββ βββ β ββββ ββββ ββββ ββ βββ ββββ β β β ββββββββββββββββββββββββββββ β ββ β ββββ ββββββββ βββββ βββββββββββ ββ β β ββ ββ β ββ ββ β β β ββ β ββββ ββββββββββ βββββ ββ ββ β ββ ββ
βββ βββ ββ βββββ ββββ βββ ββ ββ ββ ββββ ββββ βββ βββ ββ ββ β ββ ββ ββββββββββββ βββ ββ βββ ββ
β ββ ββ ββ βββ β βββ β ββββ ββ ββ β βββ ββ β β β ββ βββββ ββββββββ ββββ ββββ ββββ ββββββ βββ β ββββ ββββ ββ ββ β β ββ ββ β ββ βββ βββββ βββββ ββ β ββ ββ
β βββ βββ ββ β βββ ββ ββββ β ββ ββββ β βββ β ββββ βββ ββββ βββ ββ βββββββββ ββββ βββ ββ ββ βββ β ββ βββββ ββββ
βββββ β βββ β ββ ββ ββββ β ββ ββ ββ ββ ββ β β β βββββββ ββββ βββ ββ βββ β β βββ ββββ β ββ β β βββββββ
β β ββ βββ ββ β ββ β ββ ββ ββββββββββ ββββ βββ ββ β βββ ββ βββ ββββ βββ
ββββ β ββ ββ βββββ β βββββ ββ β β βββ ββ β βββ ββ ββ βββββ βββ
ββ ββββββ ββ ββ βββ βββ ββ ββββ ββ β ββ βββ βββ ββββ βββ ββββ βββ ββββ βββ β
β βββ ββ βββββ ββββ βββ βββ βββββββββ ββ ββββ β ββββ β ββββ βββ ββ
ββββ ββββββ β ββ βββ ββ ββββ ββββ β ββββ βββ βββ βββ βββ ββββ β β
β ββββ ββ ββ ββββββ ββ βββ ββ ββ ββ βββ βββββββββββ β β β β β β β β ββ ββββ ββββββββββββ
ββ ββ β βββ βββ βββ ββ ββββ βββ ββββ βββ β ββ βββ βββββ β ββ ββββ β ββ
βββββ βββ βββββ βββ ββ ββ βββ β ββ βββ ββ β ββββ ββββ ββ ββββ ββ βββ ββββ ββ ββ βββ βββ
ββββ β ββ ββ βββ βββ ββ ββ β ββ ββ ββ βββββββ ββββββββ ββ ββ β ββ βββ
ββ β βββ βββ β ββ ββ βββ β βββ ββββββ ββ βββββββ ββ βββββ βββ ββ ββ
β β ββ β β ββββ β β ββ βββββ βββ β βββββ β ββ β βββ ββ ββββββββββββ ββββ ββ β ββ ββββ βββ ββ β β βββ ββ β βββββββ β ββββ βββββ β βββ ββ β ββ β ββ β β β β ββ ββ βββββ βββββ ββββββ
β β β β ββ β βββ β βββ β β β β β β β β β β ββ β ββ βββ β β βββ ββ βββ ββββ ββ β ββ ββ ββ ββ βββ ββββ ββ β β β β ββ β β β β βββββ ββ β β βββ βββ ββ β ββ β β ββββ ββββ
βββββ ββββββββββ βββββββββββββββ ββββββββββββββββββββ βββββββββββ βββββ β βββ βββββ ββ ββββββββ β ββββ β ββββ ββββ βββββββββββββ βββ βββ ββ βββ ββββββββ ββββ ββ β ββ ββββββ β ββ ββ ββ
βββββ βββ ββ ββββ β β ββ ββββββββββββββββ βββββ βββ ββ βββ ββ ββ ββ βββ ββ β βββ βββ βββ βββ ββββββββββββ βββββββ βββββ ββββββββββββ βββ ββ β ββ β
βββ ββ ββββ ββ ββ ββββββββββββββ ββ β ββ ββββ βββββββ β ββββ βββββββ βββββββββββ ββββ ββββββ ββββββββ βββββββ ββ β
ββ ββββββββ ββββββββ ββ β βββββββ ββββββββ β
βββ β ββ β ββ βββββ β ββββββ β ββ βββββ ββ ββββββββ ββ ββββββ ββββ β βββ ββ βββ ββββ βββββ βββ βββ ββ
ββββββ βββββββββ ββββββββ β βββββββ ββββββ
βββ ββ ββ β ββ β ββ βββ β ββ ββββ βββββ β ββ βββββ βββββββ ββββββββ βββββ ββββ β βββββ βββ ββββββββ ββ βββ βββββββ β β
β βββββ ββ ββββββββββββββββ ββββββ β βββ ββ βββββββ β βββββ βββββββββββ βββ βββββ
βββ ββ ββββββββ βββββββ β β β ββ βββββ
βββββ βββββββββββ ββββ β β β βββ β βββ ββββββββββ βββββββββββββββββ βββ ββββ β β βββββ
βββ βββ ββ ββ ββββββ ββ β β ββββ ββββββ ββββ β ββ βββ βββ
βββ βββ ββ ββββ β βββ β βββββ ββ ββ ββββββ ββββββββββ βββββββ ββ ββ β βββ ββββββ ββ βββ βββ ββββββ ββ β ββββββββββββββ ββββ ββ β β ββ ββ βββ ββββ ββ ββββ βββ βββ ββββββ ββ ββββββββββ ββ ββ βββββββ ββ ββ βββ ββββ βββ βββ βββ β ββ β ββ ββ βββββββ βββββββββ βββ β β β ββ ββββ ββββ βββββ ββββ βββββββ βββββββ βββ βββ ββ ββ βββ ββββ ββ ββββ ββ ββββ βββββββββββ ββ ββββββββββ βββ β βββββββββ ββ ββββββββββββββββββββββββββββ ββββββββββ βββββ β
ββ ββ ββ β ββ βββ ββββ ββ β βββ ββ β β ββββ ββ βββ ββ β ββββββββ βββββββββββββββββ ββββ βββββββββ ββ ββββββββ ββ ββββββ βββββββββββββββββββ βββββββ ββ βββ ββββ βββ βββββ βββ β ββ ββββββ βββ βββββββββ βββββββββββ β β ββ βββββ ββββ ββ β ββββββ ββββββββββ ββββββββββββββββββ βββ ββββ ββββ βββββ ββββ β ββ ββ βββββ ββ ββ ββ ββββ βββββ βββ ββββββββββββββββββ βββ βββββ
β βββββββββ ββ ββ βββββ ββ βββββ ββ ββ ββββ ββ ββββββ ββββ ββββββ β βββββββ ββ βββ βββ ββββ ββ βββ β βββ ββββββββ βββ ββββ
ββββββ βββ βββ βββ βββββββ βββββββ ββ β βββ β ββ ββββ ββ ββββ ββ ββ β ββββ ββ βββ ββ βββββ βββββ βββββββ βββββββββ β ββ β ββββ βββ βββ βββ ββ ββ ββ βββββββββββ βββββββββ ββββββββ βββ ββββ
β β βββββββββ ββββββ βββ ββββ β ββ ββββ β βββ βββββββββββββββββββββββ ββββ ββ ββββββββ β
ββββ ββββββββββ βββ βββ ββββ
0.5 1.0 1.5 2.0 2.5
Wednesday, 7 October 2009
theme_set(theme_grey(10))
qplot(year, lratio, data = bysex, group = name, geom = "line")
qplot(lratio, reorder(name, lratio, na.rm = T), data = bysex)qplot(abs(lratio), reorder(name, lratio, na.rm = T), data = bysex)
qplot(abs(lratio), reorder(name, lratio, na.rm = T), data = bysex) + geom_point(data = both_sexes, colour = "red")
Wednesday, 7 October 2009
year
lratio
β2
β1
0
1
2
1880 1900 1920 1940 1960 1980 2000
What characteristics of each name might we want to use to classify them into dual-sex with sex-errors?
Wednesday, 7 October 2009
Your turn
Compute the mean and range of lratio for each name.
Plot and come up with cutoffs that you think separate the two groups.
Wednesday, 7 October 2009
rng <- ddply(bysex, "name", summarise, diff = diff(range(lratio, na.rm = T)), mean = mean(lratio, na.rm = T))
qplot(diff, abs(mean), data = rng)qplot(diff, abs(mean), data = rng, colour = abs(mean) < 1.75 | diff > 0.9)
shared_names <- subset(rng, abs(mean) < 1.75 | diff > 0.9)$name
qplot(abs(lratio), reorder(name, lratio, na.rm=T), data = subset(bysex, name %in% shared_names))qplot(year, lratio, geom = "line", group = name, data = subset(bysex, name %in% shared_names))
Wednesday, 7 October 2009
Now that weβve separated the two groups, weβll explore each in more detail.
Next time
Wednesday, 7 October 2009