+ All Categories
Home > Documents > BIO503: Homework 2 Solutions

BIO503: Homework 2 Solutions

Date post: 06-Feb-2016
Category:
Upload: euclid
View: 44 times
Download: 0 times
Share this document with a friend
Description:
BIO503: Homework 2 Solutions. Harvard School of Public Health Wintersession 2009. Question 1A. Use mfrow to set up the layout for a 3 x 4 array of plots. > par(mfrow=c(3,4)) We now have 3 rows, 4 columns and 12 plots in total. For each individual plot, we need to: - PowerPoint PPT Presentation
25
BIO503: Homework 2 Solutions Harvard School of Public Health Wintersession 2009
Transcript
Page 1: BIO503: Homework 2 Solutions

BIO503: Homework 2 Solutions

Harvard School of Public Health Wintersession 2009

Page 2: BIO503: Homework 2 Solutions

Question 1A

Use mfrow to set up the layout for a 3 x 4 array of plots. > par(mfrow=c(3,4))

We now have 3 rows, 4 columns and 12 plots in total.

For each individual plot, we need to: – generate two sets of Normal random variables

rnorm– construct a QQ plot

qqplot– add a title, set the color and plotting character

main, col, pch

Page 3: BIO503: Homework 2 Solutions

By calling successive plot commands, R will fill in each of the 12 panels row-wise.

The code we need to construct the first row:

> qqplot(rnorm(10), rnorm(10), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[1], pch=1, lwd=2, main="10 Normal RVS, Set 1")

> qqplot(rnorm(10), rnorm(10), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[2], pch=2, lwd=2, main="10 Normal RVS, Set 2")

> qqplot(rnorm(10), rnorm(10), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[3], pch=3, lwd=2, main="10 Normal RVS, Set 3")

> qqplot(rnorm(10), rnorm(10), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[4], pch=4, lwd=2, main="10 Normal RVS, Set 4")

Page 4: BIO503: Homework 2 Solutions

For the remaining rows we can start with the same code, changing the sample size, color and plotting character and modifying the title of each plot.

For the second row: > qqplot(rnorm(100), rnorm(100), xlab="Quantiles",

ylab="Quantiles", col=rainbow(12)[5], pch=5, lwd=2, main="100 Normal RVS, Set 1")

> qqplot(rnorm(100), rnorm(100), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[6], pch=6, lwd=2, main="100 Normal RVS, Set 2")

> qqplot(rnorm(100), rnorm(100), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[7], pch=7, lwd=2, main="100 Normal RVS, Set 3")

> qqplot(rnorm(100), rnorm(100), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[8], pch=8, lwd=2, main="100 Normal RVS, Set 4")

Page 5: BIO503: Homework 2 Solutions

And the third row: > qqplot(rnorm(1000), rnorm(1000), xlab="Quantiles",

ylab="Quantiles", col=rainbow(12)[9], pch=9, lwd=2, main="1000 Normal RVS, Set 1")

> qqplot(rnorm(1000), rnorm(1000), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[10], pch=10, lwd=2, main="1000 Normal RVS, Set 2")

> qqplot(rnorm(1000), rnorm(1000), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[11], pch=11, lwd=2, main="1000 Normal RVS, Set 3")

> qqplot(rnorm(1000), rnorm(1000), xlab="Quantiles", ylab="Quantiles", col=rainbow(12)[12], pch=12, lwd=2, main="1000 Normal RVS, Set 4")

Page 6: BIO503: Homework 2 Solutions

You'll get a plot that looks like this:

Page 7: BIO503: Homework 2 Solutions

Question 1B

Comment on how the appearance of the plots changes as the sample size changes.

We see that we can assess more confidently the distributional assumption as the sample size grows.

As the sample sizes get better it becomes clearer that the samples are normally distributed.

Page 8: BIO503: Homework 2 Solutions

We know the data file has the following properties: file name = cats.txt tab-delimited column headings

Therefore we use the following code to read in the data:

> cats <- read.table("cats.txt", sep="\t", header=T)

Question 2A

Page 9: BIO503: Homework 2 Solutions

Question 2B

The first column of the data.frame cats, i.e. cats[,1] contains the gender-specific information.

This code will generate a logical vector where T corresponds to the rows for female cats.

> cats[,1] == "F"

We can create subsets of the cats object in the following way:> female.cats <- cats[cats[,1] == "F",]> male.cats <- cats[cats[,1] == "M",]

Page 10: BIO503: Homework 2 Solutions

Question 2B

To fit a linear regression model for heart weight on body weight, we need to let the lm function which data set to use.

> female.mod <- lm(Hwt ~ Bwt, data=female.cats)> male.mod <- lm(Hwt ~ Bwt, data=male.cats)

Page 11: BIO503: Homework 2 Solutions

Question 2CWe can see the coefficients fitted in the model either using the

summary function or more directly, the coefficients function.

> summary(female.mod)> coefficients(female.mod)For the female cats, the slope estimate is 2.64 and the intercept

is 2.98.

> summary(male.mod)> coefficients(male.mod)For the male cats, the slope estimate is 4.31 and the intercept is

-1.18.

Page 12: BIO503: Homework 2 Solutions

Question 2D

For the female cat model:Residual standard error: 1.162 on 45 degrees of freedom

For the male cat model:Residual standard error: 1.557 on 95 degrees of freedom

Good job to those of you who actually calculated the residual standard error!

Page 13: BIO503: Homework 2 Solutions

Question 2EWe can create a new data.frame to store the information for the new cats. > new.female.cat <- data.frame(Bwt = 2.5)> new.male.cat <- data.frame(Bwt = 2.9)

Then we can use the predict function which will apply our linear model object to make a prediction about the new data.

> predict(female.mod, new.female.cat)> predict(male.mod, new.male.cat)

The female's heart weight is 9.57g, the male's heart weight is 11.32g.

Page 14: BIO503: Homework 2 Solutions

Question 2F

Construct a plot to examine the residuals for each model. There are two possible plots we could make to examine the

residuals arising from the model. – QQ-plot – scatter plot

Page 15: BIO503: Homework 2 Solutions

Question 2F – QQ Plots

> par(mfrow=c(1,2))> qqnorm(residuals(female.mod), lwd=3, pch=3,

col="orange", main="QQ Plot for Female Cat Model Residuals")

> qqline(residuals(female.mod), lwd=3, col="blue")

> qqnorm(residuals(male.mod), lwd=2, col="purple", pch=4, main="QQ Plot for Male Cat Model Residuals")

> qqline(residuals(male.mod), lwd=3, col="gold")

Page 16: BIO503: Homework 2 Solutions

Question 2F – QQ PlotsYou should get something that looks like this:

Page 17: BIO503: Homework 2 Solutions

Question 2F – Scatter Plots

> par(mfrow=c(1,2))> plot(residuals(female.mod),

ylim=range(residuals(female.mod), residuals(male.mod)), pch=3, col="orange", lwd=3, main="Residuals for the Female Cat Model", ylab="Residual Values")

> abline(a=0, b=0, lty=3, lwd=3)> plot(residuals(male.mod),

ylim=range(residuals(female.mod), residuals(male.mod)), pch=4, col="purple", lwd=3, main="Residuals for the Male Cat Model", ylab="Residual Values")

> abline(a=0, b=0, lty=3, lwd=3)

Page 18: BIO503: Homework 2 Solutions

Question 2F – Scatter PlotsYou should get something like this:

Page 19: BIO503: Homework 2 Solutions

Question 2F

Interpretation:

There aren't any strong patterns in the scatter plots. The residuals appear randomly distributed about mean 0.

The QQ plots show that most of the residuals line up on the diagonal line meaning the residuals appear to be Normally distributed.

These observations suggest that our model assumptions are valid.

Page 20: BIO503: Homework 2 Solutions

Question 2G

Construct a plot of the raw data for the male cats, add the regression line you've fitted.

> plot(male.cats$Bwt, male.cats$Hwt, pch=4, lwd=3, col="purple", ylab="Heart Weight (g)", xlab="Body Weight (kg)", main="Male Cats")

> abline(male.mod)

Page 21: BIO503: Homework 2 Solutions

Question 2G

For the male cats, we get the following plot:

Page 22: BIO503: Homework 2 Solutions

Question 2GWe use similar code to produce the corresponding plot for the female

cats. Given that the question asks us to put the two plots side by side, we need to use par and set the ylim argument.

> par(mfrow=c(1,2))> plot(female.cats$Bwt, female.cats$Hwt, pch=3,

ylim=range(cats$Hwt), lwd=3, col="orange", ylab="Heart Weight (g)", xlab="Body Weight (kg)", main="Female Cats")

> abline(female.mod)> plot(male.cats$Bwt, male.cats$Hwt, pch=4,

ylim=range(cats$Hwt), lwd=3, col="purple", ylab="Heart Weight (g)", xlab="Body Weight (kg)", main="Male Cats")

> abline(male.mod)

Page 23: BIO503: Homework 2 Solutions

Question 2GWe'll get a plot that looks like this:

Page 24: BIO503: Homework 2 Solutions

Bonus Question> col.palette <- rainbow(12) > lwd.width <- 2> pch.sym <- 1:20> nsize <- rep(c(10,100,1000), each=4)

> par(mfrow=c(3,4))> for( i in 1:12 ){

what.set <- i%%4 if( what.set == 0 ){

title.text <- paste(nsize[i], " Normal RVS, Set 4", sep="")}else{

title.text <- paste(nsize[i], " Normal RVS, Set ", what.set, sep="")

}qqplot(rnorm(nsize[i]), rnorm(nsize[i]), xlab="Quantiles", ylab="Quantiles", col=col.palette[i], pch=pch.sym[i], lwd=lwd.width, main=title.text)

}

Page 25: BIO503: Homework 2 Solutions

Bonus Question

By the way, I saw some terrific and very clever variations from students.

Congratulations and well done!


Recommended