+ All Categories
Home > Documents > Section 0 - Brigham Young University

Section 0 - Brigham Young University

Date post: 14-Jan-2022
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
18
Statistics 230 All Homework Assignments Section 0 1. Computer output for analysis of a random sample of data is shown below. Some of the quantities are missing. Compute the values of the missing quantities: Variable N Mean SE of mean Std. Dev. Variance Minimum Maximum Y 9 19.96 ? 3.12 ? 15.94 27.16 2. Consider the computer output shown below. One-Sample T: Y Test of mu=91 vs. not=91 Variable N Mean Std. Dev SE of mean 95% CI T P Y 25 92.5805 ? 0.4673 (91.6160, ?) 3.38 0.002 (a) Fill in the missing values in the output. Can the null hypothesis be rejected at the 0.05 level? Why? (b) Is this a one-sided or a two-sided test? (c) If the hypotheses had been H 0 : μ = 90 versus H 1 : μ 6= 90 would you reject the null hypothesis at the 0.05 level? (d) Use the output and the t table (or R) to find a 99 percent two-sided CI on the mean. (e) What would the P-value be if the alternative hypothesis were H 1 : μ> 91? 3. Two machines are used for filling plastic bottles with a net volume of 16.0 ounces. The filling processes can be assumed to be normal, with standard deviations of σ 1 =0.015 and σ 2 =0.018. The quality engineering department suspects that both machines fill to the same net volume, whether or not this volume is 16.0 ounces. An experiment is performed by taking a random sample from the output of each machine. (Don’t forget that for parts (a)-(c) of this problem, we know σ 1 and σ 2 .) Machine 1: 16.03, 16.04, 16.05, 16.05, 16.02, 16.01, 15.96, 15.98, 16.02, 15.99 Machine 2: 16.02, 15.97, 15.96, 16.01, 15.99, 16.03, 16.04, 16.02, 16.01, 16.00 (a) State the hypotheses that should be tested in this experiment. (b) Find the P-value for this test and test these hypotheses using α =0.05. What are your conclusions? (c) Find a 95 percent confidence interval on the difference in the mean fill volume for the two machines. (d) Re-do part (b), but this time assume that σ 1 = σ 2 = σ and that the value of σ is unknown. 4. Photoresist is a light-sensitive material applied to semiconductor wafers so that the circuit pattern can be imaged onto the wafer. After application, the coated wafers are baked to remove the solvent in the photoresist mixture and to harden the resist. Here are measurements of photoresist thickness (in kA) for eight wafers baked at 95C and eight wafers baked at 100C. Assume that all 16 of the runs were made in random order. 1
Transcript
Page 1: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

Section 0

1. Computer output for analysis of a random sample of data is shown below. Some of thequantities are missing. Compute the values of the missing quantities:Variable N Mean SE of mean Std. Dev. Variance Minimum Maximum

Y 9 19.96 ? 3.12 ? 15.94 27.16

2. Consider the computer output shown below.One-Sample T: YTest of mu=91 vs. not=91Variable N Mean Std. Dev SE of mean 95% CI T P

Y 25 92.5805 ? 0.4673 (91.6160, ?) 3.38 0.002

(a) Fill in the missing values in the output. Can the null hypothesis be rejected at the 0.05level? Why?

(b) Is this a one-sided or a two-sided test?

(c) If the hypotheses had been H0 : µ = 90 versus H1 : µ 6= 90 would you reject the nullhypothesis at the 0.05 level?

(d) Use the output and the t table (or R) to find a 99 percent two-sided CI on the mean.

(e) What would the P-value be if the alternative hypothesis were H1 : µ > 91?

3. Two machines are used for filling plastic bottles with a net volume of 16.0 ounces. Thefilling processes can be assumed to be normal, with standard deviations of σ1 = 0.015 andσ2 = 0.018. The quality engineering department suspects that both machines fill to thesame net volume, whether or not this volume is 16.0 ounces. An experiment is performedby taking a random sample from the output of each machine. (Don’t forget that for parts(a)-(c) of this problem, we know σ1 and σ2.)

Machine 1: 16.03, 16.04, 16.05, 16.05, 16.02, 16.01, 15.96, 15.98, 16.02, 15.99Machine 2: 16.02, 15.97, 15.96, 16.01, 15.99, 16.03, 16.04, 16.02, 16.01, 16.00

(a) State the hypotheses that should be tested in this experiment.

(b) Find the P-value for this test and test these hypotheses using α = 0.05. What are yourconclusions?

(c) Find a 95 percent confidence interval on the difference in the mean fill volume for thetwo machines.

(d) Re-do part (b), but this time assume that σ1 = σ2 = σ and that the value of σ isunknown.

4. Photoresist is a light-sensitive material applied to semiconductor wafers so that the circuitpattern can be imaged onto the wafer. After application, the coated wafers are bakedto remove the solvent in the photoresist mixture and to harden the resist. Here aremeasurements of photoresist thickness (in kA) for eight wafers baked at 95C and eight wafersbaked at 100C. Assume that all 16 of the runs were made in random order.

1

Page 2: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

95 C: 11.176, 7.089, 8.097, 11.739, 11.291, 10.759, 6.467, 8.315100 C: 5.263, 6.748, 7.461, 7.015, 8.133, 7.418, 3.772, 8.963

Is there evidence to support the claim that the higher baking temperature results in waferswith a lower mean photoresist thickness? Use α = 0.05. Use the entire State-Plan-Solve-Conclude process. Write a short (one-paragraph) executive summary of your conclusions.

5. Data on a random variable Y were 12, 8, 14, 20, 26, 26, 20, 21, 18, 24, 30, 21, 18, 16, 10, and20. Assuming this is a random sample from a normal distribution, test each of the following.Let α = 0.05.

(a) H0 : µ = 12 versus H1 : µ > 12 assuming that σ = 7

(b) H0 : µ = 16 versus H1 : µ 6= 16 assuming that σ = 7

(c) H0 : µ = 18 versus H1 : µ > 18 assuming that the value of σ is unknown.

6. Pretest data for experimental and control groups on course content in a special vocational-industrial course indicated:Experimental: y1 = 9.333 s1 = 4.945 n1 = 12Control: y2 = 8.375 s2 = 1.187 n2 = 8

(a) Test the hypothesis of equal means using α = 0.05. Give a conclusion.

(b) Build a 95% confidence interval for µ1 − µ2. Interpret the interval.

7. Suppose two samples randomly selected from two independent normal populations give

n1 = 9 y = 16.0 s21 = 5.0

n2 = 4 y = 12.0 s22 = 3.0

(a) Is there enough evidence to claim that the mean of population 1 is greater than themean of population 2? Give a clear conclusion. (Use α = 0.05)

(b) Build a 90% confidence interval for µ1 − µ2. Interpret the interval.

8. Susan and Olivia both took an introductory statistics class, however Susan attends UniversityA and Olivia attends University B. The final exam for University A has µ = 50 and σ = 10and Susan scored 62 points. The final exam for University B has µ = 1500 and σ = 25 andOlivia scored 1540 points. We want to know who understands statistics better by comparingSusan’s and Olivia’s final exam scores. Assuming the student body at each university iscomparable, who performed better on the final exam? Explain.

9. Assume we are looking at University A’s final exam (from the previous problem).

(a) With the information given, are you able to calculate the probability of a randomlyselected student scoring higher than 60 points?

(b) What would you have to assume about the distribution of the exam scores in order toanswer part (a)? Make your assumption(s) and calculate your answer.

2

Page 3: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

(c) With the information given, are you able to calculate the probability that a randomly-selected group of 10 students will have a mean score above 60 points?

(d) What would you have to assume about the distribution of the exam scores in order toanswer part (c)? Make your assumption(s) and calculate your answer.

(e) With the information given, are you able to calculate the probability that a randomly-selected group of 100 students will have a mean score above 60 points?

(f) What would you have to assume about the distribution of the exam scores in order toanswer part (e)? Make your assumption(s) and calculate your answer.

10. Suppose we are testing patients for cancer. Our null hypothesis is that the patient is healthy(i.e., does NOT have cancer).

(a) Describe a type I error.

(b) Describe a type II error.

(c) Would it be more important to minimize α or β in this case? Explain.

11. We are interested in µ = the mean age of current graduate students at BYU. A previousstudy (from 2008) estimated the mean to be 25 with a standard deviation of 3.

(a) What sample size do we need so that we can construct a 99% confidence intervalestimating µ that has a margin of error equal to 2 years?

(b) Consider a hypothesis test of Ho : µ = 25 years vs. Ha : µ > 25 years using α = 0.01.What sample size do we need in order to detect a difference of 2 years with 80% power?

12. Suppose we are trying to find a faster drying glue. In a study done years age, Brand A andBrand B were tested 8 times each (on a total of 16 identical surfaces) and the drying timeswere:

BrandA BrandB

1 11.56 16.412 9.16 18.523 10.81 14.134 11.35 12.495 6.04 16.296 8.60 14.547 9.72 14.608 12.26 15.79

Now, suppose Brand B has now come out with a new and improved version and we want tocollect a new sample of Brand A and the new Brand B drying times. We believe the drying

3

Page 4: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

time will be more similar and want to be able to compare mean drying times. To estimatethe variability for our new study, we will use the pooled sample variance from the old studyas an estimate for σ2.

(a) What sample size do we need so that we can construct a 95% confidence intervalestimating µB − µA that has a margin of error equal to 1 minute?

(b) Consider a hypothesis test of Ho : µB = µA vs. Ha : µB > µA, using α = 0.05. Whatsample size do we need in order to detect a difference of 1 minute with 90% power?

13. An article in the Journal of Strain Analysis compares several procedures for predicting theshear strength for steel plate girders. Data for nine girders in the form of the ratio of predictedto observed load for two of these procedures, the Karlsruhe and Lehigh methods are as follows:

Girder Karlsruhe Method Lehigh Method

1 1.186 1.0612 1.151 0.9923 1.322 1.0634 1.339 1.0625 1.200 1.0656 1.402 1.1787 1.365 1.0378 1.537 1.0869 1.559 1.052

(a) Using α = 0.05, is there evidence to support a claim that there is a difference in meanperformance between the two methods? As part of your answer, report the p-value(using R to compute it) and then interpret the p-value in the context of the problem.

(b) Construct a 95% confidence interval for the difference in mean predicted to observedload.

14. The Center for the Study of Violence wants to determine whether a conflict-resolution programin a particular high school alters aggressive behavior among its students. For 10 students,aggression was measured both before and after they participated in the conflict resolutioncourse. Their scores were the following (higher scores indicate greater aggressiveness):

Before Participating After participating

1 10 82 3 43 4 24 8 55 8 76 9 87 5 48 7 59 1 210 7 5

4

Page 5: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

(a) Test the Null Hypothesis that aggression does not differ as a result of participating inthe conflict-resolution program. Show your work.

(b) Revaluate this problem using the two sample t-test method (i.e., ignore the pairing inthe data). As always, provide the test statistic and exact p-value.

(c) Create a confidence interval for µd using the paired-comparison confidence interval

d± tα/2,n−1sd√n

and compare with the two-sample confidence interval for µ1 − µ2

x1 − x2 ± tα/2,n1+n2−2

√s2pl

(1

n1+

1

n2

).

How do the centers of the two intervals compare? How do the widths of the two intervalscompare? Why are the two intervals different?

(d) If you were to rerun this experiment to test the effectiveness of the the treatment, wouldyou use a two sample t-test or a paired comparison test? Explain your reasoning.

Section 1.1

15. #1.3 on p. 34-36

16. #1.4 on p. 34-36. For part (c), assume that the measurements are on a collection of sampledunits. For part (h), let the statement begin: “For a SIMPLE random sample,...”

17. #1.5 on p. 34-36

18. #1.6 parts (a) and (b) on p. 34-36

19. #1.8 on p. 34-36

Chapter 4

20. #A3 on p. 109

21. #A5 on p. 109

22. #A6 on p. 109

23. #A8 on p. 109

24. #A9 on p. 109

25. #B1 on p. 116

26. #B4 on p. 116

5

Page 6: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

27. #B8 on p. 117

28. #B10 on p. 117

29. #B11 on p. 117

30. #B14 on p. 118

31. #C2 on p. 124

32. #C3 on p. 124

33. Read the study described below and answer the questions.

6

Page 7: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

(a) Which of the following best describes the study above? (choose one)

i. designed experiment in which experimental units are randomly sampled from thepopulation of interest

ii. designed experiment using available experimental units

iii. observational study in which samples are randomly selected from preexisting distinctgroups

iv. observational study using nonrandom sample

(b) Can you conclude that arguing with parents protects children from (or causes decreasedsusceptibility to) drugs and alcohol? Explain. If causation cannot be concluded, howcould the study be changed to make causation a plausible conclusion.

34. Read the study described below and answer the questions.

7

Page 8: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

8

Page 9: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

(a) Which of the following best describes the study above? (choose one)

i. designed experiment in which experimental units are randomly sampled from thepopulation of interest

ii. designed experiment using available experimental units

iii. observational study in which samples are randomly selected from preexisting distinctgroups

iv. observational study using nonrandom sample from preexisting distinct groups

(b) Can you conclude that boosting a woman’s confidence improves her spatial reasoningabilities? Explain. If causation cannot be concluded, how could the study be changedto make causation a plausible conclusion.

Sections 1.2-1.3

35. #1.14 on p. 36-37

36. Consider the spatial reasoning study described in problem number 34, where subjects receivedfeedback (either criticism or compliments) after their performance on an unrelated pre-task.Suppose that the mean spatial rotation scores for the groups of interest were as follows:

• mean score for men that were criticized after the pre-task = 85%

• mean score for men that were complimented after the pre-task = 87%

• mean score for women that were criticized after the pre-task = 70%

• mean score for women that were complimented after the pre-task = X

Given what you know about the study and that the study concluded that there was asignificant interaction between gender and pre-task feedback type, which of the followingvalues for X is most reasonable: 65%, 70%, 72%, or 87%? Explain how you chose youranswer over the other options.

9

Page 10: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

37. Suppose that an experiment is run comparing the final exam grades of Stat 230 students.Two factors are considered: (i) lecture time (either morning or afternoon) and (ii) major(stat or non-stat). Suppose that the sample size is large enough so that a difference of atleast 5% on the final exam would be a significant difference across lecture times or acrossmajors. Further, suppose that the number of students in each of the four treatment groups isequal and that the mean final exam score for the morning section stat majors was 83%. Foreach problem below, create a table formatted as follows (the numbers in italics will be filledin by you):

Majorstat non-stat overall

Lecture morning 83 85 84Time afternoon 78 72 75

overall 80.5 78.5

(a) What might the means for the other three groups be IF morning did significantly betterthan afternoon, stat did significantly better than non-stat, and there was NO evidenceof a lecture time × major interaction?

(b) What might the means for the other three groups be IF morning did significantly betterthan afternoon, stat did significantly better than non-stat, and there was strong evidenceof a lecture time × major interaction?

(c) What might the means for the other three groups be IF morning did significantly betterthan afternoon, stat and non-stat were equivalent, and there was strong evidence of alecture time × major interaction?

(d) What might the means for the other three groups be IF morning and afternoon wereequivalent, stat and non-stat were equivalent, and there was strong evidence of a lecturetime × major interaction?

Chapter 3

Note that problems 6-10 on p. 103 are based on the introductory paragraphlabeled “The bivariate BF[1] model.”

38. #6 on p. 103

39. Fill in the blanks: The estimated effect for long days tells how far it is from to. The residual for the first observation tells how far it is from

to .

40. #8 on p. 103. Also give a p-value for the day length factor and give a conclusion.

41. #9 on p. 103

42. #10 on p. 103. Note that you are using the long/short day data on page 103 in order tomimic the analysis done in Example 3.3.

10

Page 11: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

43. #17 on p. 104

44. #23 on p. 106

45. #24 on p. 106. For false statements, re-write the statement so that it is true (changing asfew words as possible...changing a false statement to “Snow is colder than molten lava” willnot be given points...nice try, though).

46. #26 on p. 106

47. Consider the popcorn data on page 3.8 of the lecture notes, with the complete ANOVA tableon page 3.29 of the lecture notes on the course webpage. Give a 95% C.I. for each of thefollowing differences in means:

(a) µhigh salt - µlow salt

(b) µbuttery oil - µcanola oil

(c) µhigh salt with buttery - µlow salt with canola

(d) Calculate the width of each of the intervals in (a), (b), and (c). (Calculate the upperconfidence limit minus lower confidence limit.) Why is the width in (c) different fromthe widths in (a) and (b)?

Chapter 5

NOTE: For all HW problems requiring statistical computing (e.g., R or SAS),I expect type-written responses. Make sure that you paste in your code andthe appropriate sections of program output in addition to your type-writtenconclusions. DO NOT simply attach pages of computer output. Cut and pasteonly parts you refer to in your discussion. Large stacks of computer output willNOT be graded. Also, working in groups is fine, but each student should writehis or her own interpretations/conclusions. Identical HW assignments will betreated as plagiarism.

48. If data came from a normal distribution, what fraction of the data will be classified as outlierswhen using the “Tukey” boxplot in R? Show your work.

49. Read the cancer.txt data set into R. (The data set is on the course webpage and there iscode that you can cannibalize in section5.R.) The column names are given in the first rowof the file. Suppose we are interested in seeing if the mean survival time in days is the samefor each cancer type. (We’re NOT doing the ANOVA here yet, just checking conditions withexploratory data analysis.)

(a) Use means, sds, and boxplots to evaluate whether or not these data are appropriate foran ANOVA. Specifically, are there outliers, unequal sds across cancer type, or evidenceof non-normality?

(b) Repeat part (a) after taking a log transform of the survival times, e.g.:

logsurv <- log(cancer$days)

11

Page 12: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

(c) Compare your answers in parts (a) and (b) and make a recommendation for analysis.

50. Use the R command below to obtain a randomized ordering for 36 subjects that will beassigned to one of four treatments (A, B, C, and D).

sample(1:36,36,replace=FALSE)

Give your randomized list and explain how you would use this list to assign treatments tothe 36 subjects.

51. Suppose that a veterinary psychologist runs a balanced BF[1] experiment to study the effectof diet on depression in dogs. She uses a collection of 15 labrador retrievers that have beendiagnosed with severe depression (e.g., listless, apathetic, no interest when live squirrels are inthe room). She places each dog on one of 3 experimental diets (all Cheetos, all steak, all tofu)for 3 months and then records the depression score for each at the end of the study, wherehigh depression scores indicate more extreme depression. The mean depression score for eachgroup was: Cheetos=27, Steak=22, and Tofu=11. Tofu is amazing! (Note: these data weremade up by your instructor.) Create the factor diagram (aka “decomposition tables”) forthe data, with diets as columns. Fill in the locations in each table with their known values,leaving a “?” at each location where you don’t have enough information to specify the value.Properly label your diagram and write the df under each table/box.

52. #D8 on p. 178

53. #D9 on p. 178

54. #D10 on p. 178 (Note the typo: “Cond. avg.” should say “Cond. eff.”)

55. #D14 on p. 179 (If you want, you can generate random numbers in R and check some ofthese properties yourself. For example, to generate 100 random numbers from a standardnormal distribution, use: x <- rnorm(100).)

56. #D21 on p. 180. Instead of calculating critical values, instead use your F -statistic for“Conditions” to calculate a p-value using R.

57. #D22 on p. 181

58. #D24 on p. 181. Include an F -statistic and a p-value for the Group factor.

59. USING R, do a complete analysis of variance comparing survival times for the cancer typesdiscussed in problem 49. Remember that you will want to compare the log of survival time(see problem 49(b)). For full credit, you must show all code and the appropriate output.

(a) Calculate and list the mean log-survival time in days for each cancer type.

(b) Test the hypothesis

H0: All cancer types have the same mean log-survival time.

Give the ANOVA table. Interpret the F statistic and p-value and then make a conclusion.

12

Page 13: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

60. USING SAS, do a complete analysis of variance comparing mean log-survival times for thecancer types discussed in problem 49. For full credit, you must show all code. To do thelog transform in SAS, adapt the following code (changing MYDIR to your personal Stat 230directory name):

data cancer;

infile ’MYDIR\cancer.txt’ firstobs=2;

input type $ gender $ age days;

logsurv = log(days);

run;

OR, you could use the following:

data cancer;

input type $ gender $ age days;

logsurv = log(days);

datalines;

[PASTE THE CONTENTS OF CANCER.TXT HERE]

;

run;

(a) Calculate and list the mean log-survival time in days for each cancer type.

(b) Test the hypothesis

H0: All cancer types have the same mean log-survival time.

Give the ANOVA table. (Since your interpretation should be the same here as in theprevious problem where you used R, there is no need to re-write the same interpretation.)

(c) Use the group means or difference in means from the output, along with the MSE fromthe ANOVA table, to calculate (by hand) the confidence interval for µkidney − µstomach.(The formula is written on the last page of the section3 lecture notes posted on thewebpage.)

61. We are interested in comparing 4 different methods for preparing for the ACT exam:

• Method A: Control–just take the exam

• Method B: Take one practice exam

• Method C: Take a prep course online

• Method D: Be hypnotized the day before

You are interested in assessing the power of the F test (in ANOVA) for detecting differencesin preparation method means when the significance level is α = 0.05.

(a) Suppose that ACT scores have a standard deviation of 4.7, and suppose we would liketo evaluate the possibility that the group means are µA = 21, µB = 23, µC = 25, andµD = 27. In R, make a plot that shows the power of the F test when n = 2, 3, . . . , 20.(Print and include this plot with your homework.)

13

Page 14: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

(b) What is the smallest value for the group size (n) that gives 85% power?

(c) What would happen to your power curve if your hypothesized means were µA = 27, µB =25, µC = 23, and µD = 21. Explain your answer at the level of a Stat 121 student.

(d) What would happen to your power curve if your hypothesized means were µA = 21, µB =21, µC = 21, and µD = 27. Explain your answer at the level of a Stat 121 student.

(e) What would happen to your power curve if your hypothesized means were µA = 21, µB =21, µC = 27, and µD = 27. Explain your answer at the level of a Stat 121 student.

Chapter 6

62. #A1 on p. 207. Note: There is a typo in this problem. Where it reads “Draw and labela two-way table showing the two TREATMENTS and...” it should read “Draw and label atwo-way table showing the two FACTORS and...”

63. #B3 on p. 214

64. #B5 on p. 215

65. #B6 on p. 215

66. Use the file snapbean.txt (on the webpage) to conduct a two-way ANOVA in R. Thisexperiment endeavors to evaluate whether the date of sowing and/or the variety of snapbeanplant will affect the total yield of snapbeans. You should test if “Yield” (the response) isaffected by “sowdate” (1=early,...,4=late), “variety” (1, 2, or 3), or the interaction of sowdatewith variety. Conduct your analysis in R. Give code and appropriate output. For each of thefollowing effects, write a sentence which gives an appropriate conclusion (including referencesto the p-value and the hypotheses of interest):

(a) sowdate

(b) variety

(c) interaction

67. Use the file programmers.txt (on the webpage) to conduct a two-way ANOVA in SAS. Thisexperiment was run to see how the type of experience of computer programmers and/orthe years of experience for programmers impacts their ability to accurately estimate thetime needed (in programmer days) to complete a large systems project. The responsevariable “TimePredictionError” represents the difference between the actual time required tocomplete a large systems project and the programmer’s estimated time. Note that all valuesare positive, meaning that every subject underestimated the length of the task, but largervalues represent larger time-prediction errors. You should test if “TimePredictionError”(the response) is affected by “LargeSystemExp” (no=experienced only with small systems,yes=experienced with large systems), “YearsOfExp” (less5 = less than 5 years, less10 =between 5 and 10 years, more10=more than 10 years), or the interaction of LargeSystemExpwith YearsOfExp. Conduct your analysis in SAS. Give code and appropriate output. Foreach of the following effects, write a sentence which gives an appropriate conclusion (includingreferences to the p-value and the hypotheses of interest):

14

Page 15: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

(a) LargeSystemExp

(b) YearsOfExp

(c) interaction

68. USING SAS, do a complete analysis of variance on heights of singers in a choir, found in thefile singerheights.csv (note that it is comma-delimited). For full credit, you must show allcode. Fit an ANOVA model that includes terms for gender (“f” or “m”), singing part (“low”or “high”), and the interaction between gender and part. (Note that the low part for femalesis generally called alto, high part for females is soprano, low part for males is bass, and highpart for males is tenor. However, we are interested in the association between singing thehigh/low part and height, so we are treating this as a 2 × 2 factorial instead of a one-wayanova with 4 levels of “singing part.”)

(a) Find the complete ANOVA table USING TYPE I SS. Carry out the complete analysisconsidering the decision flow diagram discussed in class for two-way ANOVA. Give acomplete interpretation for each of the terms in the model.

(b) Find the complete ANOVA table USING TYPE III SS. Carry out the complete analysisconsidering the decision flow diagram discussed in class for two-way ANOVA. Give acomplete interpretation for each of the terms in the model.

(c) Why is the SS for gender so much smaller with Type III SS? Explain.

69. For this problem, you will conduct an analysis of the BF[2] data you gathered in your catapultexperiment.

(a) Give the ANOVA table (using either R or SAS). Give your code and your output.

(b) List the three null hypotheses for your experiment. For each hypothesis, write aconclusion for your test of the hypothesis. (Make sure your statement for each hypothesisreferences the p-value and clearly states the conclusion in context.)

Section 7.1

70. #A1 on pages 250-1

71. #A2 on pages 250-1

72. #A3 on pages 250-1

73. Use the file marketing.txt on the website. The first column is sales of a product of interest(in dollars), the second column is the shelf height factor (shelf height for the product beingsold), and the third column is day of week (the blocking factor). On each day, the researcherin this study randomly assigned a product of interest to a location on a five-level store shelfand then recorded the total sales for each shelf at the end of the day. Our primary interest isto see if the shelf heights have different mean sales.

(a) Write out a well-labeled factor diagram for this study. (By “factor diagram,” we mean thedecomposition tables WITHOUT the numbers filled in.) Also, write down the statisticalmodel, carefully defining on the parameters in the model.

15

Page 16: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

(b) Why would the researchers choose to treat day of the week as a block?

(c) Use SAS to analyze the data. Does the shelf height for the product affect the sales?Does the blocking factor turn out to be an important source of variability? (As always,include your code and use carefully-selected SAS output to justify your conclusions.)

(d) Now ignore the blocks and re-run the analysis as a BF[1] design. How do your conclusionschange? Why are the results different from the CB[1] analysis?

Section 7.3

74. #C1(a,b,c,e) on page 266. On part (e), by “factor diagram,” we mean the decompositiontables WITHOUT the numbers filled in. Beneath the table associated with each term in themodel, indicate the associated df.

75. Consider the experiment described in Example 7.11 on page 261, with data given on thebottom of page 281.

(a) The following is known about the analysis: mean of all observations = 21.25, SSplants =483.75, SSdeblading = 24.5, SSinteraction = 265.75, SSresidual = 42.75, SStotal = 16194.Using what you know about the design of the experiment and the information above,give the complete ANOVA table for the data including appropriate F -statistics andp-values. (You will want to use ‘1-pf(blah,blah,blah)’ in R to find the p-values.)

(b) Using the file auxin.csv on the webpage, run the analysis in SAS to check your work inpart (a). Include code and selectively-chosen parts of the SAS output. Discuss the resultsof the experiment, including the significance or non-significance of each hypothesis testof interest (i.e., discuss the test for each factor). What conclusions can be drawn aboutthe theories about the source of auxin and the role of leaf blades?

Section 7.2

76. #B1 on pages 256-7

77. Now use the data from the previous problem (found in Figure 7.7 on page 254) to conductthe formal analysis in SAS. You can either (1) type the data into a spreadsheet, save the fileas .csv format, and read in the data; or (2) type the data directly into a SAS data step, e.g.,

data cows;

input cow period diet $ yield;

cards;

1 1 roughage 608

1 2 partial 716

.

.

.

3 3 partial 832

;

16

Page 17: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

run;

(a) Does diet have a significant effect on yield? Compare the means for the diets.

(b) Did the nuisance variables (cow and time period) have substantial impact on the yield?

Section 7.4

78. #D1 on pages 278-9

79. #D2 on pages 278-9

80. #D3 on pages 278-9

81. #D4 on pages 278-9

82. #D9 on pages 278-9 (this is a CB[1])

83. #D11 on pages 278-9 (this is a SP/RM[1;1]) NOTE: To complete this problem, assume thatthe first column of the “Blocks” box reads “1, -2, ?, -1, 1, ?”

Chapter 11

84. In this problem, you will use SAS to do a complete analysis of variance on the head injuryseverity scores associated with 7 types of cars. The data are found in the file headinjury.csv(note that it is comma-delimited). For full credit, you must show all code.

(a) Give the name of the appropriate design for these data and write down the statisticalmodel, carefully defining on the parameters in the model.

(b) Our primary interest is to see if the car types have different mean head-injury severityscores. Write down the appropriate null and alternative hypotheses, carefully definingall symbols.

(c) Give the ANOVA table and interpret the proper F-test for the hypotheses of interest.

(d) Assume that our primary interest is in constructing a confidence interval for each possiblepairwise comparison. If we want to ensure that the family-wise error rate is no greaterthan 0.05, which multiple-comparison approach is most appropriate? Use your chosenapproach and interpret this set of pairwise comparisons—which means are significantlydifferent from each other? [Hint: for part (d) and (e), you can use

means cartype / tukey;

means cartype / bon;

means cartype / scheffe;

in SAS and it will give you Least Significant Difference which is the confidence interval’smargin of error (half the C.I. width). Alternatively, you can use

17

Page 18: Section 0 - Brigham Young University

Statistics 230 All Homework Assignments

means cartype / tukey cldiff;

means cartype / bon cldiff;

means cartype / scheffe cldiff;

which gives all pairwise C.I.’s.]

(e) Compare the width of the interval for one of the pairwise comparisons—say, “µcompact−µvan”—when using: (i) Tukey’s HSD, (ii) Scheffe’, and (iii) Bonferroni. Based on thewidth of the intervals, which is the best approach?

(f) Re-do the analysis, this time assuming that instead of looking at all the pairwisecomparisons, you only want to consider 3 different contrasts: (i) mean of thetrucks&vans&minivans minus the mean of the other 4 car types, (ii) mean of theheavy&medium cars minus the mean of the light&compact cars, and (iii) mean ofminivans minus mean of compact cars. If we want to ensure that the family-wise errorrate is no greater than 0.05, which multiple-comparison approach is most appropriate?Use your chosen approach and interpret the 3 contrasts described—which contrastsare statistically significant? NOTE: When specifying contrasts, if you need to enter−1

3 use “-0.33333333333” not “-0.33”. SAS needs contrasts to sum “exactly” to zero.Alternatively, you can multiply every element of a contrast by a constant and the testof the contrast will not be affected. That is, you change the contrast from (−1

323

−13 )

to (−1 2 −1) with no change to the F-statistic and p-value for the contrast.

85. Use the analysis of the wear data that we did in class (code is in section11.sas). Instead ofthe contrasts previously considered in section11.sas, suppose that we are interested in thefollowing two contrasts:

• (mean of fabric wear values in filler level 1 (cotton) and proportion level 2 (50% filled))minus (mean of fabric wear values in filler level 2 (polyester))

• (mean of fabric wear values in filler level 1 and proportion level 1) minus (mean of fabricwear values in filler level 1 and proportion level 2)

If we want to ensure that the family-wise error rate is no greater than 0.05, which multiple-comparison approach is most appropriate? Use your chosen approach and interpret the 2contrasts described—which contrasts are statistically significant?

18


Recommended