BIO503: Homework 2 Solutions

Post on 06-Feb-2016

44 views 0 download

description

BIO503: Homework 2 Solutions. Harvard School of Public Health Wintersession 2009. Question 1A. Use mfrow to set up the layout for a 3 x 4 array of plots. > par(mfrow=c(3,4)) We now have 3 rows, 4 columns and 12 plots in total. For each individual plot, we need to: - PowerPoint PPT Presentation

transcript

BIO503: Homework 2 Solutions

Harvard School of Public Health Wintersession 2009

Question 1A

Use mfrow to set up the layout for a 3 x 4 array of plots. > par(mfrow=c(3,4))

We now have 3 rows, 4 columns and 12 plots in total.

For each individual plot, we need to: – generate two sets of Normal random variables

rnorm– construct a QQ plot

qqplot– add a title, set the color and plotting character

main, col, pch

By calling successive plot commands, R will fill in each of the 12 panels row-wise.

The code we need to construct the first row:

> qqplot(rnorm(10), rnorm(10), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[1], pch=1, lwd=2, main="10 Normal RVS, Set 1")

> qqplot(rnorm(10), rnorm(10), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[2], pch=2, lwd=2, main="10 Normal RVS, Set 2")

> qqplot(rnorm(10), rnorm(10), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[3], pch=3, lwd=2, main="10 Normal RVS, Set 3")

> qqplot(rnorm(10), rnorm(10), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[4], pch=4, lwd=2, main="10 Normal RVS, Set 4")

For the remaining rows we can start with the same code, changing the sample size, color and plotting character and modifying the title of each plot.

For the second row: > qqplot(rnorm(100), rnorm(100), xlab="Quantiles",

ylab="Quantiles", col=rainbow(12)[5], pch=5, lwd=2, main="100 Normal RVS, Set 1")

> qqplot(rnorm(100), rnorm(100), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[6], pch=6, lwd=2, main="100 Normal RVS, Set 2")

> qqplot(rnorm(100), rnorm(100), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[7], pch=7, lwd=2, main="100 Normal RVS, Set 3")

> qqplot(rnorm(100), rnorm(100), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[8], pch=8, lwd=2, main="100 Normal RVS, Set 4")

And the third row: > qqplot(rnorm(1000), rnorm(1000), xlab="Quantiles",

ylab="Quantiles", col=rainbow(12)[9], pch=9, lwd=2, main="1000 Normal RVS, Set 1")

> qqplot(rnorm(1000), rnorm(1000), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[10], pch=10, lwd=2, main="1000 Normal RVS, Set 2")

> qqplot(rnorm(1000), rnorm(1000), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[11], pch=11, lwd=2, main="1000 Normal RVS, Set 3")

> qqplot(rnorm(1000), rnorm(1000), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[12], pch=12, lwd=2, main="1000 Normal RVS, Set 4")

You'll get a plot that looks like this:

Question 1B

Comment on how the appearance of the plots changes as the sample size changes.

We see that we can assess more confidently the distributional assumption as the sample size grows.

As the sample sizes get better it becomes clearer that the samples are normally distributed.

We know the data file has the following properties: file name = cats.txt tab-delimited column headings

Therefore we use the following code to read in the data:

> cats <- read.table("cats.txt", sep="\t", header=T)

Question 2A

Question 2B

The first column of the data.frame cats, i.e. cats[,1] contains the gender-specific information.

This code will generate a logical vector where T corresponds to the rows for female cats.

> cats[,1] == "F"

We can create subsets of the cats object in the following way:> female.cats <- cats[cats[,1] == "F",]> male.cats <- cats[cats[,1] == "M",]

Question 2B

To fit a linear regression model for heart weight on body weight, we need to let the lm function which data set to use.

> female.mod <- lm(Hwt ~ Bwt, data=female.cats)> male.mod <- lm(Hwt ~ Bwt, data=male.cats)

Question 2CWe can see the coefficients fitted in the model either using the

summary function or more directly, the coefficients function.

> summary(female.mod)> coefficients(female.mod)For the female cats, the slope estimate is 2.64 and the intercept

is 2.98.

> summary(male.mod)> coefficients(male.mod)For the male cats, the slope estimate is 4.31 and the intercept is

-1.18.

Question 2D

For the female cat model:Residual standard error: 1.162 on 45 degrees of freedom

For the male cat model:Residual standard error: 1.557 on 95 degrees of freedom

Good job to those of you who actually calculated the residual standard error!

Question 2EWe can create a new data.frame to store the information for the new cats. > new.female.cat <- data.frame(Bwt = 2.5)> new.male.cat <- data.frame(Bwt = 2.9)

Then we can use the predict function which will apply our linear model object to make a prediction about the new data.

> predict(female.mod, new.female.cat)> predict(male.mod, new.male.cat)

The female's heart weight is 9.57g, the male's heart weight is 11.32g.

Question 2F

Construct a plot to examine the residuals for each model. There are two possible plots we could make to examine the

residuals arising from the model. – QQ-plot – scatter plot

Question 2F – QQ Plots

> par(mfrow=c(1,2))> qqnorm(residuals(female.mod), lwd=3, pch=3,

col="orange", main="QQ Plot for Female Cat Model Residuals")

> qqline(residuals(female.mod), lwd=3, col="blue")

> qqnorm(residuals(male.mod), lwd=2, col="purple", pch=4, main="QQ Plot for Male Cat Model Residuals")

> qqline(residuals(male.mod), lwd=3, col="gold")

Question 2F – QQ PlotsYou should get something that looks like this:

Question 2F – Scatter Plots

> par(mfrow=c(1,2))> plot(residuals(female.mod),

ylim=range(residuals(female.mod), residuals(male.mod)), pch=3, col="orange", lwd=3, main="Residuals for the Female Cat Model", ylab="Residual Values")

> abline(a=0, b=0, lty=3, lwd=3)> plot(residuals(male.mod),

ylim=range(residuals(female.mod), residuals(male.mod)), pch=4, col="purple", lwd=3, main="Residuals for the Male Cat Model", ylab="Residual Values")

> abline(a=0, b=0, lty=3, lwd=3)

Question 2F – Scatter PlotsYou should get something like this:

Question 2F

Interpretation:

There aren't any strong patterns in the scatter plots. The residuals appear randomly distributed about mean 0.

The QQ plots show that most of the residuals line up on the diagonal line meaning the residuals appear to be Normally distributed.

These observations suggest that our model assumptions are valid.

Question 2G

Construct a plot of the raw data for the male cats, add the regression line you've fitted.

> plot(male.cats$Bwt, male.cats$Hwt, pch=4, lwd=3, col="purple", ylab="Heart Weight (g)", xlab="Body Weight (kg)", main="Male Cats")

> abline(male.mod)

Question 2G

For the male cats, we get the following plot:

Question 2GWe use similar code to produce the corresponding plot for the female

cats. Given that the question asks us to put the two plots side by side, we need to use par and set the ylim argument.

> par(mfrow=c(1,2))> plot(female.cats$Bwt, female.cats$Hwt, pch=3,

ylim=range(cats$Hwt), lwd=3, col="orange", ylab="Heart Weight (g)", xlab="Body Weight (kg)", main="Female Cats")

> abline(female.mod)> plot(male.cats$Bwt, male.cats$Hwt, pch=4,

ylim=range(cats$Hwt), lwd=3, col="purple", ylab="Heart Weight (g)", xlab="Body Weight (kg)", main="Male Cats")

> abline(male.mod)

Question 2GWe'll get a plot that looks like this:

Bonus Question> col.palette <- rainbow(12) > lwd.width <- 2> pch.sym <- 1:20> nsize <- rep(c(10,100,1000), each=4)

> par(mfrow=c(3,4))> for( i in 1:12 ){

what.set <- i%%4 if( what.set == 0 ){

title.text <- paste(nsize[i], " Normal RVS, Set 4", sep="")}else{

title.text <- paste(nsize[i], " Normal RVS, Set ", what.set, sep="")

}qqplot(rnorm(nsize[i]), rnorm(nsize[i]), xlab="Quantiles", ylab="Quantiles", col=col.palette[i], pch=pch.sym[i], lwd=lwd.width, main=title.text)

}

Bonus Question

By the way, I saw some terrific and very clever variations from students.

Congratulations and well done!