+ All Categories
Home > Documents > TPS Chapter 1: Exploring Data - Marquette University...

TPS Chapter 1: Exploring Data - Marquette University...

Date post: 24-Mar-2018
Category:
Upload: duonganh
View: 221 times
Download: 2 times
Share this document with a friend
40
TPS Chapter 1: Exploring Data Q1. The science of data is known as ____. A1. Statistics Q2. Most raw data sets can be organized into rows and columns. Each row represents some object or person that is studied, and each column represents some characteristic about that thing that is measured. Our textbook calls those objects and characteristics what two things respectively? A2. Individuals and variables Q3. What are the two main classes of variable types? A3. Categorical and quantitative Q4. A description, depiction, or equation telling what values a variable takes on and how often it takes on these values is called the ___ of the variable. A4. Distribution Q5. Before studying the relationships among variables, it's usually good to begin by examining what? A5. Each variable by itself Q6. Before getting numerical summaries of the data, your textbook advises exploring the data with what? A6. Graphs Q7. What two types of graphs are usually most appropriate for categorical data? A7. Bar charts and pie charts Q8. If several percentages do not represent portions of the same whole, then what type of graph is inappropriate? A8. A pie chart Q9. When you are asked to describe a distribution after looking at a graph, the general tactic is to look for an overall pattern and also for striking deviations from that pattern. When describing the overall pattern, what three features should you mention? A9. Center, shape, and spread Q10. When you are asked to describe a distribution, the general tactic is to look for an overall pattern and also for striking deviations from that pattern. What are the striking deviations called? A10. Outliers Q11. Someone wants to display this center, shape, and spread of a data set with a picture. But the person also wants to communicate, through the same graph, the individual raw data values that were collected in the study. There are too many different values that the variable takes on to make a dot plot feasible. What type of graph should the person choose? A11. A stem plot
Transcript

TPS Chapter 1: Exploring Data Q1.  The science of data is known as ____. A1. StatisticsQ2.  Most raw data sets can be organized into rows and columns.  Each row represents some object or person that is studied, and each column represents some characteristic about that thing that is measured. Our textbook calls those objects and characteristics what two things respectively?

A2. Individuals and variables

Q3.  What are the two main classes of variable types? A3. Categorical and quantitativeQ4.  A description, depiction, or equation telling what values a variable takes on and how often it takes on these values is called the ___ of the variable.

A4. Distribution

Q5.  Before studying the relationships among variables, it's usually good to begin by examining what?

A5. Each variable by itself

Q6.  Before getting numerical summaries of the data, your textbook advises exploring the data with what?

A6. Graphs

Q7.  What two types of graphs are usually most appropriate for categorical data?

A7. Bar charts and pie charts

Q8.  If several percentages do not represent portions of the same whole, then what type of graph is inappropriate?

A8. A pie chart

Q9.  When you are asked to describe a distribution after looking at a graph, the general tactic is to look for an overall pattern and also for striking deviations from that pattern.  When describing the overall pattern, what three features should you mention?

A9. Center, shape, and spread

Q10.  When you are asked to describe a distribution, the general tactic is to look for an overall pattern and also for striking deviations from that pattern.  What are the striking deviations called?

A10. Outliers

Q11.  Someone wants to display this center, shape, and spread of a data set with a picture.  But the person also wants to communicate, through the same graph, the individual raw data values that were collected in the study. There are too many different values that the variable takes on to make a dot plot feasible. What type of graph should the person choose?

A11. A stem plot

Q12.  Instead of a dot plot or a stem plot, a ____ is the most common graph of the distribution of one quantitative variable.

A12. Histogram

Q13.  What does your textbook depict as a minimum number for either the number of stems in a stem plot, or the number of classes in a histogram?

A13. Five

Q14.  If the right and left sides of a histogram are approximately mirror images of each other, we call the distribution what?

A14. Symmetric

Q15.  If there's a big hump on the left side of a histogram and a long tail extending far out to the right, do we say that the distribution is skewed right or skewed left?

A15. Skewed right

Q16.  If you look at people's incomes, defining income so that zero is the smallest possible value, and your sample includes mainly middle income people but at a few extremely high income people, will the distribution be skewed right or skewed left?

A16. Skewed right

Q17.  Mary gets a test report saying that 79% of the test takers fell at or below the score that she made.  The name of the type of score she got is what?

A17. Percentile

Q18.  A relative cumulative frequency graph is often called what? A18.  Ogive.Q19.  In a relative cumulative frequency graph, or ogive, the horizontal axis is for the values of the variable you are looking at.  For any given value on the horizontal axis, what does the value on the vertical axis stand for?

A19. The fraction of observations less than or equal to that value

Q20.  If you are given a relative cumulative frequency graph, and someone asks you to find the center of the distribution, how do you do it?

A20. Find the value on the x-axis that has a 50% or .5 value on the y-axis.

Q21.  On a time plot, what axis does time go on? A21. The horizontal axisQ22.  On a time plot, an overall upward or downward slope is called what?

A22. A trend

Q23.  On a time plot, what do you call the shorter-term variations that occur regularly, repeating themselves in a cyclic fashion?

A23. Seasonal variation

Q24. 1/n times the summation of the x(i), where n is the number of cases and x(i) is the value of the ith case, is known as what?

A24. The mean

Q25.  The number in a distribution such than half the observations are smaller and the other half are larger is called what?

A25. The median

Q26.  If there is no middle value in a data set because you have an even number of cases,  how do you do find the median then?

A26. You find the mean of the two center observations.

Q27.  Between the mean and median, which of these is pulled farther in the direction of extreme values or outliers?

A27. The mean.

Q28.  If a distribution is highly skewed to the right, which value will be lower: the mean, or the median?

A28. The median.

Q29.  From which statistic, the mean or the median, can you recover the total value of all the cases in your data set, if you know how many cases there are?

A29. The mean

Q30.  What's the definition of the range of a distribution? A30. The difference between the largest and smallest value

Q31.  What's the chief problem with using the range as a measure of the spread of a distribution?

A31. It's too sensitive to outliers, and it depends on only two values in the data set.

Q32.  What you call the median of the subset of observations whose position in the ordered list is to the left of the overall median?

A32. The first quartile

Q33.  What's the definition of the interquartile range? A33. The third quartile minus the first quartile.

Q34.  What's the rule of thumb for defining outliers in terms of the interquartile range?

A34. An outlier falls more than 1.5 times the interquartile range above the third quartile or below the first quartile.

Q35.  What five numbers are in the so-called five number summary? A35. The minimum, the first quartile, the median, the third quartile, and the maximum.

Q36.  What type of graph gives a picture of the five number summary? A36. The box plotQ37.  What's the difference between a regular box plot and a modified box plot?

A37. In a regular box plot, the whiskers go out to the maximum and minimum.  In a modified box plot, the whiskers go out to the largest and smallest data points that are not outliers.  The outliers are plotted as isolated points on a modified box plot.

Q38.  If you take the deviation of each observation from the mean of the whole set, square those deviations, add those squares, and divide by one less than the number of observations, what do you call the resulting number?

A38. The variance

Q39.  What is the relationship between the variance and the standard deviation?

A39. The standard deviation is the square root of the variance.

Q40.  How is the standard deviation like the interquartile range? A40. Both of them are measures of spread of the distribution.

Q41.  When you average the squared deviations from the mean to find the variance of a sample, what should you divide by: the n of cases, or the "degrees of freedom"?

A41. The degrees of freedom

Q42.  Under what conditions will a standard deviation equal zero? A42. When all the observations have the same value.

Q43.  Between the interquartile range and the standard deviation, which is more resistant to the effects of the outliers?

A43. The interquartile range

Q44.  How do you choose between the five number summary on the one hand, and the mean and standard deviation, on the other hand, as ways of describing a distribution?

A44. The mean and standard deviation are good for reasonably symmetric distributions that are free of outliers.  Otherwise the five number summary is usually better.

Q45.  If you add the same number to each observation, how does that affect the center and the spread of the distribution?

A45. The number that you add is added to the measures of center, such as the mean and median. But measures of spread, such as the interquartile range and standard deviation, are not affected.

Q46.  If you multiply each observation by the same number, how does that affect measures of center and spread?

A46. Both the measures of center (median and mean) and the measures of spread (standard deviation and interquartile range) are multiplied by the same number. (The variance, which is also a measure of spread, is multiplied by the square of the number each observation is multiplied by.) 

Q47.  What are three graphical methods of comparing distributions? A47. Side by side bar graphs, back-to-back stem plots, and side-by-side box plots.

TPS Chapter 2: The Normal Distribution Q1. The scales of density curves are adjusted so that the total area under each curve is what?

A1. One

Q2. The area under the density curves between a couple of x-axis values represents what?

A2. The proportion of all observations that fall between those values.

Q3. Do measures of center and spread apply a to density curve as well as to sets of observations?

A3. Yes

Q4. How do you define the median of the density curve? A4. The point with half the area under the curve to its left and the remaining half of the area to its right.

Q5. The quartiles of a density curve divide the area into what? A5. Four equal parts.Q6. What is the relationship between the mean and the median of a symmetric density curve?

A6. They are equal.

Q7. Which is pulled the farther toward the tail of a skewed distribution: the median, or the mean?

A7. The mean

Q8. In conventional notation, what are the meanings of x-bar and s, as contrasted to mu and sigma?

A8. The first two refer to the mean and standard deviation, respectively, of a set of observations, a sample.  The second two refer to the mean and standard deviation, respectively, of a density curve or idealized distribution, or the population distribution.

Q9. What three features describe the overall shape of normal curve? A9. Normal curves are symmetric, single peaked, and bell shaped.

Q10. Is there only one normal curve, or is there an infinite number of normal curves?

A10. An infinite number.

Q11. For any given mean and standard deviation, is there only one normal curve, or an infinite number of normal curves?

A11. Only one.

Q12. How can you visually find the points one standard deviation from the mean of a normal curve?

A12. Those points are the inflection points of the curve.  That is, the curve changes from falling more and more steeply to falling less and less steeply, or vice versa. (Optional answer for calculus lovers: they are points where the second derivative of the curve equals zero.)

Q13. The distributions of test scores, of measures of characteristics of living things, and of summary statistics for chance outcomes repeated many times, often (but not always!) follow what type of distribution?

A13. The normal distribution

Q14. What three percentages do you have to remember when you are stating the “empirical rule”?

A14. 68%, and 95%, and 99.7%.

Q15. Are the three percentages for 1, 2, and 3 standard deviations exact, or easier-to-remember rounded approximations?

A15. Approximations.

Q16. What do the three percentages in the empirical rule apply to? in other words, what is the meaning of this rule?

A16. The three numbers tell the per cent observations falling within the region plus or minus 1, 2, or 3 standard deviations from the mean, respectively, in a normal curve. (Note that the percents refer to the percent of observations encompassed by the interval from that number of standard deviations below the mean to that number above the mean.)

Q17. True or false: If  Mary scores one standard deviation above the mean on a normally distributed test, then approximately 68% of the test takers scored as close to the mean of the test as, or closer to the mean than, Mary did.

A17. True

Q18. True or false: If Mary scores one standard deviation above the mean a on a normally distributed test, her score is in the 68th percentile.

A18. False

Q19. True or false: if Mary scores one standard deviation above the mean on a normally distributed test, half of 68% or 34% are above the mean but at or below Mary’s score.  An additional 50% are below the mean.  Thus Mary equals or surpasses 50% plus 34% of the test takers, and is at the 84th percentile.

A19. True

Q20. What does the notation N(100,15) mean? A20. It denotes a normal in distribution with mean it 100 and standard deviation 15.

Q21. True or false: the standard score for any observation tells how many standard deviations that score is from the mean.

A21. True

Q22. What two operations do we do, to standardize a score? A22. Subtract the mean and divide by the standard deviation.

Q23. A standard score is often called by what other term? A23. The z-score.Q24. What does the sign of a standard score correspond to? A24. If the z-score is positive, it’s above the

mean, and if negative, below the mean.Q25. Are there an infinite number of standard and normal curves, each with its own equation describing it, or just one standard normal curve, with just one equation describing it?

A25. There is just one standard normal curve, with only one equation and describing it.

Q26. In a table of areas under the standard normal curve, what does the table entry for each z score represent?

A26. The area under the curve to the left of z, or in other words, the proportion of cases with values less than z.

Q27. What steps do you follow to use the z table solve the following “problem prototype”: given N(mu, sigma), please find the proportion of observations less than x?

A27. First we standardize x (by subtracting mu and dividing by sigma). Then we look at the z table to find the proportion of the distribution less than the z score we’ve obtained.

Q28. What steps do you follow with a z table if you want to know what proportion of the scores are between two values? 

A28. Look up the proportion less than the first, and less than the second, and find the difference between the two proportions.

Q29. What two pictures do TPS recommend drawing when solving problems where you are asked to solve problem where you are given normal curves and asked for proportions of the observations?

A29. They recommend drawing unstandardized and standardized normal curves, and shading in the areas that are asked for.

Q30. What do the authors recommend (as a word to the wise for future test-takers) as the last step of problems giving a normal distribution and asking for proportions of observations?

A30. They recommend stating the conclusion in the context of the problem. Thus rather than just saying, the answer is 49%, you would say, “About 49% of boys have cholesterol levels between 170 and 240 mg/dl.”

Q31. What steps do you go through when you want to find a value given a proportion of a normal distribution, using the z-table?

A31. You look for the proportion in the body of the table, and you find at the margin the z-score that corresponds to it. Then you “unstandardize” the z-score.

Q32. What operations do you do, in what order, to “unstandardize” a z-score, or turn the z-score into a raw score?

A32. You multiply the z score by the standard deviation, and then you add it to the mean.

Q33. Suppose you have a data set, and you want to see if it is approximately normally distributed. What’s the first thing to do, before doing calculations?

A33. Make a frequency histogram or stemplot, and see if the curve looks bell-shaped and symmetric.

Q34. What’s a way of checking a data set for normality, using the empirical rule?

A34. Find the mean and sd of the data set, and count (or get a computer to count) the proportions of observations that are within 1, 2, and 3 standard deviations of the mean. See if these proportions correspond, roughly, to .68, .95, and .997. 

Q35. True or false: The point of making a normal probability plot is to see whether a set of numbers is normally distributed.

A35. True

Q36. If you were to take any set of numbers, and plot the numbers on the x-axis, and their z-scores on the y-axis, you would get what shape for your graph? (Hint1: y values are (1/sd)*x + mean/sd, or of the form y=mx +b) (Hint 2: Standardizing a score involves a linear transformation.)  

A36. A line.

Q37. On a normal probability plot for a set of observations, what goes on the x axis?

A37. The values of the observations themselves.

Q38. True or false: On a normal probability plot, what goes on the y-axis for each x value is the z score that would be associated with the percentile for that value, assuming a normal distribution (and using midpoints of intervals in finding percentiles)?

A38. True.

Q39. What conclusion do you come to if a normal probability plot is not linear?

A39. That the data are not normally distributed.

TPS Chapter 3: Examining Relationships Q1. Suppose that a researcher wants to study the effect of people’s ever having taken the drug "ecstasy" upon the people’s memory scores when tested.  Which of these is the response variable and which is the explanatory variable?

A1. Ingestion of ecstacy is the explanatory variable and memory test scores are the response variable.

Q2. How do the terms "dependent variable" and "independent variable" correspond to the terms “response variable” and “explanatory variable”?

A2. Dependent corresponds to response, and independent corresponds to explanatory.

Q3. Is it proper to use the terms, response variable and explanatory variable, if the explanatory variable does not actually cause the response variable?

A3. Yes.  No implication of causation is contained in the terms explanatory and response (or independent and dependent).

Q4. What's the order of tasks involved in the examining relationships between two variables?

A4. First plot the data, then use numerical summaries. Look for the overall pattern and deviations from that pattern, and when the overall pattern is regular, use a mathematical model to describe it.

Q5. Suppose that someone has math scores for the children in one classroom, and English scores for a second set of children in another classroom.  The person asks you about making a scatterplot for these data.  What would you reply?

A5. It doesn't make sense to use a scatterplot in this situation, because a scatterplot involves graphing two variables measured upon the same individuals.

Q6. True or false: in a scatterplot, each point represents one individual; the x-coordinate of the point represents the value of one variable and the y-coordinate represents the value of another variable measured on that same individual.

A6. True

Q7. If there is an explanatory variable, which axis should it be graphed on?

A7. The x-axis

Q8. When describing a scatterplot, what three aspects of the pattern should you refer to?

A8. The form, the direction, and the strength of the relationship. 

Q9. True or false: in describing the form of a scatterplot, it's important to say whether the graph appears to be linear or not.

A9. True

Q10. In describing the form of a scatterplot, what term do you use if the values tend to fall into two or more groups that are separated from one another by gaps?

A10. You say that there are clusters.

Q11. In describing the direction of a scatterplot, when there is a positive or negative slope, we say that the variables are positively or negatively what?

A11. Positively or negatively associated.

Q12. When any given x value on a scatterplot has vary widely varying y values associated with it: the more widely varying the y values, the less _____ is the relationship between the two variables.

A12. strong

Q13. When you are drawing a scatterplot, what symbols should you use in showing the axes if the origin of the graph is not at zero?

A13. You use a symbol that looks like two slashes to indicate a break in the scale.

Q14. What are about 3 other guidelines on how to draw scatterplots properly?

A14. Make the intervals uniform.  Label both axes.  Choose a scale that makes your graph big enough.

Q15. Suppose that you want your scatterplot to reflect the influence of a particular categorical variable, in addition to the relationship of the two quantitative variables that are plotted.  For example, suppose you want to graph the relation between entertainment violence and real-life violence for males and females on the same graph, in such a way that displays the relationship separately for males and females.  What should you do?

A15. Use a different symbol on the scatterplot for the points designating males as for those designating females.

Q16. A common problem in constructing a scatterplot occurs when two or more individuals have exactly the same values for each of the two variables.  What should you do in that case?

A16. Use a different plotting symbol to call attention to those points.

Q17. Which is a better method for judging the strength of a linear relationship: simply to look at a graph, or to use a calculated numerical statistic that summarizes the strength of the linear relationship? Why do you think your chosen method is better?

A17. A calculated statistic works better, because our eyes can be deceived by a different scaling methods used in graphs.

Q18. What is the summary statistic that measures the strength of a linear relationship?

A18. The correlation coefficient.

Q19. We’ve used Greek letter mu to represent a population mean; x-bar to represent a sample mean; Greek sigma to represent the standard deviation, and s to represent the sample standard deviation. What letter does out book use to designate what it calls the correlation?

A19. r

Q20. Given that the letter r, for the correlation coefficient, is in our own alphabet and not the Greek alphabet, do you think it refers to a sample statistic or a population parameter?

A20. sample statistic

Q21. Would you guess that there is some other Greek letter that refers to the population value of the correlation coefficient?

A21. Yes.  (It’s the letter rho, which looks pretty much like a p!) 

Q22. When you look at the formula for the sample correlation coefficient that your text gives, you see (xi-xbar)/sx   and (yi-ybar)/sy.  Can you give a simpler name to these expressions?

A22. These are the standard scores, or z-scores, for the ith individual.  The first factor is the z-score for the x variable and the second is the z-score for the y variable.

Q23. What is the meaning of a positive and negative sign associated with the correlation coefficient?

A23. A positive sign means there’s a positive association between the variables; in other words, higher values of one are associated with higher values of the other. A negative sign means there’s a negative association; that is, higher values of one variable are associated with lower values of the other.

Q24. Suppose one person calculates the correlation of IQ score of some individuals with number of boxing matches fought, testing the hypothesis that boxing (the explanatory variable) affects IQ (the response variable).  A second person, using the same data set, also calculates the correlation of the number of fights with IQ score, only this person thinks of IQ as the explanatory variable and number of fights as the response variable.  Do they get the same correlation, or different ones?

A24. The same correlation. The correlation coefficient is not affected by which variable is considered explanatory and which is considered response.

Q25. Suppose someone codes race as follows: 0=Caucasian, 1=African American, 2=Asian, 3=Hispanic, 4=American Indian 5=Other. Then someone calculates a correlation between race and a reading test score for a sample of kids. Do you have a problem with this? If so, what’s your problem?

A25. The problem is that the correlation coefficient is to be used with quantitative variables, not categorical variables like this. The obtained correlation would be meaningless, and an artifact of the arbitrary coding system.

Q26. Melinda computes a correlation between the height of mothers and their daughters. Lunk is looking at the computations, and says, “You blew it! You have the height of mothers measured in centimeters, and the height of the daughters measured in inches!”  Please tell whether Melinda needs to do anything to fix her correlation coefficient, and if so, what? 

A26. Melinda did not blow it, because the correlation coefficient comes out the same no matter what units are used. (This is because a transformation from one unit to another (which involves multiplying each number in the data set by the same number) multiplies both the mean and the standard deviation of the data set by the same number (as was learned in chapter 1).  The z-score, which is (xi-xbar)/sx  comes out the same, because each of the three numbers that make up the z-score is multiplied by the same factor, and that factor cancels out. Since the z-scores are not affected by changes of units, the correlation coefficient is also not affected.)

Q27. What range of values is possible for the correlation coefficient? A27. -1 to +1.Q28. What sort of correlation coefficient do you find when two variables have a very strong linear relationship, and when as the first gets greater, the second gets smaller?

A28. A correlation close to –1.

Q29. Suppose the data points are two variables collected for all the days of 2005.  For each of those days, imagine that we know (variable 1) the number of words your instructor for this course spoke in that day, and (variable 2) the peak barometric pressure for that day in Caracas, Venezuela.  About what would you guess the correlation between these two variables to be? Why?

A29. You’d guess a correlation of about 0, since there is no reason to expect that these two variables would rise and fall in association with each other.

Q30. Suppose there are two variables which, when graphed in a scatterplot, form an almost perfect u-shaped parabola. Would the strong relationship between these variables imply a high correlation coefficient (meaning one close to 1)? Why or why not?

A30. No, because the correlation coefficient measures the strength of linear relationships only, not curvilinear relationships. A u-shaped curve  isn’t a straight line!

Q31. Does the correlation coefficient resemble the median and interquartile range in being fairly resistant to outliers, or resemble the mean and standard deviation in being heavily influenced by outliers? 

A31. Like the mean and sd, the correlation coefficient can be greatly influenced by outliers. 

Q32. Someone practices guessing correlation coefficients from scatterplots using an “applet” on the internet. Why should the person not get too confident of his or her guessing power given scatterplots of real-life data?

A32. Because the scales of the variables are not necessarily the same as they were on the applet, and scales can throw off  “eyeball” estimates.

Q33. In attempting to give a more complete description of a set of data involving two variables, someone wants to give a measure of center and spread as well as measure of the correlation coefficient. Assuming the person has made a good decision to use the correlation coefficient, what measures of center and spread would be most consistent with the correlation coefficient: the mean and sd or the median and IQR? 

A33. The mean and sd, because the formula for the correlation uses the mean and sd.

Q34. The women in a corporation think that they are being discriminated against in their salaries. A management spokesman says to them, “Look at this plot. The first data point is the average salary for men who have worked here 1 year, put into an ordered pair with the salary for women who have worked here one year. The second ordered pair is the average salary for men and women with two year’s experience, and so forth. The correlation between men’s salaries and women’s salaries is .95! That’s almost a perfect correlation! You women have nothing to complain about!” Is this argument valid? Why or why not?

A34. It’s not valid. The correlation coefficient measures the predictability of one score from another, not the equality of the two scores. Adding same value to all values of either x or y does not change the correlation – the same with subtraction, multiplication, or division. So the salaries of women could be half those of the comparable salaries of men, or $10,000 less than the comparable salaries of men, and you could still get a high correlation.

Q35. Suppose that you have a data set with a correlation fairly close to 0. All the numbers for both variables are between 0 and 10.  There are about 10 individuals in the data set. Then suppose that one more individual gets added, an outlier with a value of 100, 100. What do you think the correlation coefficient will become? (Try it out with your calculator or minitab if you want, or mimic this situation on an “applet.”)

A35. The correlation will become close to 1, because this one outlier has such a strong effect.

Q36. True or false: In a regression line, like a correlation coefficient, you get the same numbers (slopes and intercepts) no matter which variable is considered the explanatory variable and which is considered the response variable.

A36. False. The change in y per unit change in x, for example, is not the same as the change in x per unit change in y.

Q37. Please explain, for a least squares regression line: the sum of the squares of what are being minimized?

A37. The squares of the errors for each data point, where the errors are the distances from the data point to the regression line. (The word residuals is also correct.)

Q38. Please explain why the distance from the data point to the regression line corresponds to the idea of an “error.”

A38. The regression line gives predicted values of y (called y-hat) for each x.  There is also an actual observed value y for each x, for each data point. The difference between the actual and the predicted value is the “error” in prediction that is made by using the regression line to predict the response variable.

Q39. What’s the formula for the slope of a regression line, in terms of the correlation between the two variables and their standard deviations? (Call the slope b, the correlation r, and the two sd’s sx and sy.)

A39. b=r (sy/sx).

Q40. Every least-squares regression line passes through what point? A40. It passes through xbar, ybar, the ordered pair formed by the means of both variables. 

Q41. Once you know the slope of a regression line, how would you find the y-intercept, knowing the means of the x values and the y values? (call the intercept a, and the means for x and y   xbar and ybar.)

A41. a=ybar- b*xbar.  You get this by just solving for a the equation ybar=b*xbar + a. And the second equation comes from the fact that ybar is always the predicted value of y for xbar. 

Q42. When you have a regression equation delivered by the computer software output, and someone asks you for the predicted value of y given a certain x value, what do you do?

A42. Just substitute the value of x into the equation and solve for the predicted y value.

Q43. Suppose that someone measures height as a function of weight for a bunch of human adults, and gets a regression equation predicting height as a function of weight. Why is the y-intercept of the equation not as meaningful or important as the slope, or as the equation as a whole?

A43. Because the y-intercept corresponds to the height of someone with weight 0. But the weight of 0 is far outside the range of weights measured in the study and thus the height predicted will be an extrapolation. Secondly, the weight of 0 is one seldom found in human beings, (at least those who have already been born and aren’t dead yet)!  

Q44.  Suppose you have a regression equation output from a computer and you are asked to plot the line by hand. How would you do it?

A44. Just pick two values of x, and calculate the yhat values for each, and connect those two dots. It helps if you pick points that are close to the bottom and top ends of the range. (One easy point is the y-intercept.)

Q45. When, in the context of regression, people speak about the SST (sum of squares total), what do they mean by that?

A45. The sum of the squared deviations of each y value from the mean of the y values.

Q46. When in the context of regression people speak of the SSE or sum of squares for error, what do they mean by that?

A46. They mean the sum of the squares of the deviations of the actual y values from the predicted y values. (These deviations are also called residuals.)

Q47. Your book doesn’t define very explicitly in this chapter what the sum of squares for regression is. Do you think it would be reasonable to think of that as the sum of all the squared deviations of the predicted y values (the y-hats) from the mean of the y values? Particularly if a trustworthy source hinted that it was? (P.S. you can calculate the SSReg in your head, easily, for the 3-point data set of example 3.11 on page 160: it comes out to 32. The SST comes out to 38, and the SSE to 6.)

A47. Yes.

Q48. The book speaks of the sum of squares for the regression as the SST-SSE, or the sum of squares total minus the sum of squares for error. Can we infer from this that the total sum of squares, SST, can be partitioned into the SSReg (sum of squares for regression) and the SSE, (sum of squares for error), and that SST=SSReg+SSE? (P.S. I use the notation SSReg so as not to confuse sum of squares for regression with sum of squares for residuals.)

A48. Yes.

Q49. The square of the correlation coefficient, or r-squared, a.k.a. the coefficient of determination, means what in terms of the fraction of the total sum of squares? Please answer in symbols and words.

A49. r-squared = the SSReg/SST  or (SST-SSE)/SST. The r-square is the fraction of the total sum of squares that is accounted for by the regression of y on x.

Q50. One person studies IQ as a function of number of boxing matches participated in, and another uses the same data set to study boxing matches participated in as a function of IQ. (That is, matches is the explanatory variable in the first study and IQ is the explanatory variable in the second.) Do they both get the same value for the slope of the regression line? Can you explain in words the reason for this answer? (You may assume a negative relationship between the two variables in constructing language for your answer.)

A50. No. The first slope tells how many points IQ changes per additional boxing match, and the second slope tells how many fewer boxing matches someone has for each additional IQ point.

Q51. The slope of the regression line b is equal to r*(sy/sx).  Along the regression line, a change in 1 standard deviation in x results in a change of how many standard deviations of y? (Hint: the slope is the change in y over the change in x. So the change in y equals the slope times the change in x.  So if the change in x is sx, we get r*(sy/sx)*sx, which equals...

A51. r standard deviations of y, or r*sy. 

Q52. True or false: the slope of the regression line tells you how many unstandardized units the predicted value of y changes for each unstandardized unit change in x.

A52. True.

Q53. True or false: the correlation coefficient tells you how many standard deviations the predicted y changes for each standard deviation change in x.

A53. True.

Q54. True or false: If both of two variables y and x are standardized, (so that the standard deviation of both is 1) then the slope of the regression line and the correlation coefficient are equal.

A54. True.

Q55. What is another name for y – yhat, or the deviation of the observed y value from the predicted value, or the error in prediction for a given value, or the vertical distance between any data point and the regression line?

A55. The residual.

Q56. True or false: when we speak of a “least squares” regression line, we mean that we choose the line so as to minimize the squares of the residuals. 

A56. True.

Q57. Someone draws a graph of residuals (on the y axis) versus the values of the explanatory variable.  This graph is called what?

A57. A residual plot.

Q58. Someone draws a residual plot, and all the values are positive. Someone says to that person, “There must be some mistake.” Why did the person say that?

A58. Because the mean of the least-squares residuals is always zero; thus if you have positive values you have to have at least one negative value.

Q59. If the linear regression equation fits the data well, what do you see on the residual plot?

A59. A uniform scatter of points, without a clear pattern, and with no unusual individual observations.

Q60. What do you call a data point that has a big effect on the slope or intercept of the regression line?

A60. An influential point.

Q61. Does an influential point necessarily have a large residual? (Hint: the influential point can pull the line close to it.)

A61. No.

TPS Chapter 4: More on Two-Variable Data Q1. In the example at the beginning of this chapter, a plot of the log of brain weight as a function of the log of body weight provides a “better fit” for the observed data than a simple plot of brain weight as a function of body weight.  What is meant by better fit? (This is also the answer to the question, what are we trying to do (at least in this course) when we transform data?)

A1. A better fit means that the data are more linear – a linear model is more successful in describing the relationship, the correlation coefficient is higher, the sum of squares of residuals for a regression line is lower.

Q2. Can you figure out why we would want to transform data so as to get a more linear relationship?

A2. Because then we can use the methods of correlation and regression that were studied in the previous chapter and will be studied more in future chapters, which are very powerful methods of statistical analysis.

Q3. True or false: if we have a curvilinear function, and we want to straighten it out to make a linear function, we can’t do that by multiplying or dividing by constants or adding or subtracting constants (i.e. by using linear transformations). 

A3. True. Linear transformations don’t straighten curves. 

Q4. What are the transformations that are most commonly used, other than linear transformation?

A4. Positive and negative powers, and logarithms.

Q5. What is the definition of a monotonic function? A5. It’s one where as x increases, y always increases (a monotonic increasing function) or as x increases, y always decreases (a monotonic decreasing function). It produces a graph that doesn’t go up and dip down, but consistently has either a positive or negative slope.

Q6. Is it kosher to speak of a function as being, for example, monotonic increasing over part of the domain of x, and monotonic decreasing over another part? If so, can you give an example?

A6. Yes. An example is y=x2, which is monotonic decreasing for negative values of x, and monotonic increasing for positive values of x.

Q7. True or false: There are often two steps in transformation. The second is to apply a power or logarithmic function that simplifies the data. The first is to use a linear transformation, such as adding a constant, that makes the values all positive, so that the function applied in the second step will be defined and monotonic increasing.

A7. True.

Q8. How is the ladder of power functions useful?  A8. When we are trying to straighten out curved data sets, we can go in one direction or the other along the ladder, seeing how straight the line becomes, rather than just randomly picking different functions.

Q9. Linear growth is to adding a fixed amount per unit time as exponential growth is to ______  by a fixed amount per unit time.

A9. Multiplying.

Q10.  If the number of a certain type of bacteria doubles every two hours, is that linear growth or exponential growth?

A10. Exponential.

Q11. Increasing everyone’s salary by a certain percentage is to ______ growth as increasing everyone’s salary by the same dollar amount is to _______  growth.

A11. Exponential, linear.

Q12. Suppose we have a function y=ab^x, where a and b are constants and x is the explanatory or independent variable, and y is the response or dependent variable. Is this an example of an exponential function, or a power function?

A12. An exponential function. (This is a function like y=2^x.)

Q13. Suppose we have a function y=ax^b, where a and b are constants and x is the explanatory variable and y is the response variable. Is this an example of an exponential function, or a power function?

A13. A power function.  (This is a function like y=x^2.)

Q14. If y is an exponential function of x, plotting what function of y versus x should result in a linear graph?

A14. The log of y versus x.

Q15. Suppose you do a regression of the log (base 10) of y versus x, and you get a nice linear scatterplot and a high coefficient of determination (r^2) when you do a regression. Now you can use this linear relationship for prediction. Suppose someone (like a test-maker) asks you what the predicted value is of y (not log y) for a given value of x. How would you find it?

A15. You’d just use your equation to find the predicted value of log y. Then you take the antilog (or 10 to that number) to get the predicted value of y. In other words, you “untransform” the value back to the original scale.

Q16. If a variable grows exponentially, its logarithm grows how? A16. Linearly.  Q17. To make an exponential function linear, we use the log transformation just with the response variable y. To make a power function linear, we use the log transformation with what?

A17. Both the explanatory and the response variable.

Q18. If you start with the power function y=ax^p, and take the log of both sides, what result do you end up with?

A18. log y=log a +p log x.

Q19. Suppose you have a data set, and its scatterplot is curved. Then you take the log of both explanatory and response variables, and plot them, and you get a line. What do you infer from this?

A19. That the original variables were related according to a power function (or power law). 

Q20. When you plot the log of y vs. the log of x, do you give any meaningful interpretation to the slope of the line that you get? If so, what is it?

A20. According to the equation log y =log a + p log x,  the slope of the line is the power to which x is raised in the original power function.

Q21. Jane gets a regression coefficient (i.e. a slope) of 3.617 when regressing log y vs. log x. She says, “Now I know that x and y are related in a power function, and y= a constant times x to the 3.617 power.” Do you think this conclusion should be tempered or qualified? If so, how?

A21. Yes. The slope you obtain in any given experiment is an estimate of the population value of the slope, and not an exact rendering of it. Another sample is very likely to give a different slope, and it could possibly even lead to a different conclusion about the form of the functional relationship!

Q22. Suppose you plot the log y vs. the log x and you get a good line, with intercept 2 and slope 3. So log y=2+3log x.  Now you are asked to find the equation for y in terms of x, without logs in it.  How do you do this?

A22. You just take the antilog of both sides. You get y= 10^(2+3 log x), or y=10^2*(10^log x)^3, or y=100*x^3. That is, y=100 times x cubed.

Q23. Suppose I find that in the range of 3 to 7 milligrams of Ritalin given to a group of children, their math scores rise in linear fashion with increasing dose. A parent looks at the regression equation and says, “By my calculations, all it would take would be 400 milligrams of Ritalin for my child to get an 5 on the AP statistics course, while he’s still in 3rd grade.” What do we call this type of reasoning (which often leads to incorrect conclusions)?

A23. Extrapolation, which is using the regression equation to make predictions for values of the explanatory variable that we have no experience with. 

Q24. I notice a linear relationship between shoe size and basketball prowess, and propose to help my daughter’s fifth grade basketball team win by dressing them all in size 15 Converse All-Stars.  Someone says, “But in your study, there’s another variable, namely height, that was not among your explanatory or response variables and yet may influence the interpretation of the relationship between shoe size and basketball skill.” What type of variable is height, in this situation?

A24. A lurking variable.

Q25. In the example you just read, a lurking variable enhanced the apparent association between two variables. Can lurking variables also mask or attenuate the apparent association between two variables? If so, can you think of an example?

A25. Yes. The relation of overcrowding and lack of indoor toilets on page 227 of TPS is an example. Here’s another: suppose a researcher finds that the correlation between violent entertainment viewed and aggressive behavior in real life is low. Then someone reanalyzes the data and finds that when you make separate scatterplots for males and females, you get a strong relationship. Males are much more aggressive than females, and when you throw the data together on one scatterplot you get more of a horizontal line than a sloped one. Thus the lurking variable of gender obscured the real association that was there.

Q26. What’s one way of discovering lurking variables that someone may not have thought about?

A26. Plot the data as a function of time. You may able to see relationships that can be linked to events that happened at a certain time, or conditions that changed at a certain time.

Q27. One researcher studies the murder rate of every state as a function of the average literacy level of the teenagers in that state. The correlation is reported to be very strong (and close to –1).   Another researcher studies aggression in individuals in a high school, as a function of their reading scores. The correlation, though in the same direction, shows much less strong a relationship. Can you explain why?

A27. Correlations with averaged data are generally considerably higher than correlations obtained with individuals. This is because some of the random variation in each variable gets reduced when you sum or average over many individuals. (Later on in the course, you’ll find out that the variance of the mean of a set of observations for a population is equal to 1/n times the variance of the individual observations, where n is the number of observations.) The less “noisy” both variables are, the more highly they tend to correlate with each other.

Q28.  When two variables X and Y are found to correlate with each other, of course two possible explanations for this association are 1) that X causes Y, and 2) (one not diagrammed on page 232) that Y causes X. Please name the other two possible explanations that are good to keep in mind when interpreting findings of associations.

A28. Common response (z causes both x and y) and confounding (z, which is associated with x, may cause y).

Q29. Suppose a researcher studies the effects of a way of teaching children not to be violent. The researcher gives the instruction to all the children in Mrs. Harmony’s classroom, and uses the kids in Mr. Gutsly’s classroom as a comparison group. But then the researcher realizes that Mrs. Harmony has a very different personality and interpersonal style than Mr. Gutsly: she tries to promote kindness and good will, whereas Mr. Gutsly is mainly interested in promoting competitiveness and not being wimps. What would we say about the variables of teacher personality and interpersonal style in this study?

A29. That they are CONFOUNDED with the intervention. Thus the effects of these teacher variables can’t be distinguished from the effects of the intervention the study is meant to test.

Q30. Someone finds that the degree of physical fitness in youth (as measured by heart rate recovery from exercise) is correlated with the number of ankle injuries the person has had. But before concluding that we should hurt the ankles of youth in order to make them more fit, a COMMON RESPONSE explanation for the association comes to mind. Can you posit this common response explanation?

A30. That both fitness and ankle injuries are associated with more running or more athletic activity – both are responses to this basic causal variable.

Q31. Even when causation is present, is there usually one and only one contributing cause for a given effect, at least in the types of phenomena people study with statistics?

A31. No.

Q32. Someone says, “Lots of kids play “shooter” video games for hundreds of hours, and never do anything violent. Therefore these games can’t cause violence.” What does the principle, as stated in your text, that “Even when direct causation is present, it is rarely a complete explanation of an association between two variables” have to do with this reasoning?

A32. Another way of stating this principle is that one phenomenon does not have to be a necessary and sufficient condition for a second, in order to be causally related. Therefore one or several instances of non-association do not disprove a causal relationship.

Q33. What is the strongest type of evidence for causal relations? A33. Well-designed experiments that are meant to control for all lurking variables. (These usually entail randomly assigning individuals to different conditions.)

Q34. What’s the problem with doing a well-designed experiment, for example, to see what the effects of child abuse are?

A34. We will never find it ethical to randomly assign children to conditions of child abuse versus nonabuse.

Q35. Is it possible to come to valid causal inferences without doing experiments that randomly assign people to various conditions? Can you give an example of such?

A35. Although your text says that “the only fully compelling method” of establishing causality is an experiment, we can and do come to valid causal inferences without randomly assigning people to conditions. The example of smoking and lung cancer is one where the evidence for causation is “overwhelming” despite no study in which people were randomly assigned to smoke or not smoke over many years.

Q36. A two-way table describes the relation between two of what kind of variables?

A36. Categorical.

Q37. When you look at a two-way table that looks like thisparty affiliation                      Approval of president’s performance                                                Yes                       No                      Total Democrat                                25                         100                          125Republican                            125                         5                              130Total                                     150                         105                           255What is the row variable, and what is the column variable?

A37. The party affiliation is the row variable and the approval of the president’s performance is the column variable.

Q38.  If we look, at the table above, at the totals for the rows, we get how many Democrats and how many Republicans are in the sample. Similarly, the column totals tell us how many approvers and disapprovers are in our sample. These give us the distribution for each variable separately, in our sample. These distributions are called what?

A38. Marginal distributions. (Because they’re in the right and bottom margins of the table.)

Q39. The above table gives the results in counts. Especially when the marginal distributions are not equal (for example, if the sample should contain twice as many Republicans as Democrats) we should convert the count data to what kind of data?

A39. To per cents (or fractions).

Q40. True or false: When describing the relationship between two quantitative variables, the scatterplot and the correlation coefficient are usually the graph and numeric measure of choice; but in describing the relation between two categorical variables, no single graph or numeric measure summarizes the strength of the association. We usually pick and choose among bar charts and pie charts and the reporting of various per cents.

A40. True.

Q41. Someone looks at a sample of 500 men and 100 women. 250 men oppose the war, whereas 80 women oppose the war. The researcher says, “Lots more men than women oppose the war. Therefore the idea that women in this area are more pacificist is incorrect.” What’s the problem with this reasoning, and what should the researcher do?

A41. He should not just use the counts, but find the per cents. 50% of men, but 80% of women, in this sample opposed the war.  So in this region it looks like the women are more anti-war than the men.

Q42. Suppose you have three age groups, and you have data on how many individuals got educated to each of 4 different levels. Suppose you calculate, just for one of the age groups, the per cent of people in that age group who attained each level. This distribution of per cents for one age group is called what?

A42. A conditional distribution.

Q43. Do the per cents for a conditional distribution add to 100 for each of the different groups for which you calculate them?

A43. Yes.

Q44. Do the per cents for conditional distributions equal the per cents for marginal distributions?

A44. No, not necessarily.

Q45. There were two AP Statistics teachers. 40% of the 40 students in the first teacher’s classes got 5’s, and 25% of the 40 students in the second teacher’s classes got 5’s. People assumed that the first teacher is better. However, someone then studied the results based on whether or not the students scored above or below a certain cutoff on the SAT, before going into AP Statistics. The first teacher had 80% of students above this cutoff and 20% below. The second teacher had 20% above and 80% below. The first teacher had 50% of the “aboves” get 5’s, and none of the “belows.” The second teacher had  75% of the “aboves” get 5’s, and 12.5% of the “belows.” Now which teacher appears to be better, and why?

A45. The second teacher, because a higher fraction of that teacher’s students got 5s from those both above the cutoff and below the cutoff.

Q46. The situation above is whose paradox? A46. Simpson’s.Q47. True or false: In Simpson’s paradox, there is a lurking variable, which predisposes the results against one of the two groups; controlling for the effects of that lurking variable by looking separately at the subsets formed by the categories of it reveals results in the opposite direction from those obtained when ignoring the lurking variable.

A47. True.

Q48. If a lurking variable can actually reverse the direction of results, do you think it is also possible that a lurking variable could result in lack of an observed association when in fact there is a causal influence?

A48. Yes.

Q49. Does the fact that lurking variables can obscure influences that are actually present imply that:  not only does correlation not imply causation, but lack of correlation does not rule out causation? 

A49. Yes.

TPS Chapter 5: Producing Data Q1. The difference between an observational study and an experiment is that in the first, the explanatory variable is observed and measured, whereas in an experiment, the explanatory variable is ____.

A1. Deliberately imposed, manipulated. 

Q2. When there is a jobs program for welfare recipients, and you simply observe that those who voluntarily take part in the program do better than those who don’t, what’s the problem with inferring that the program causes better results?

A2. The effects of the program are confounded with the characteristics that lead people to seek the program, for example motivation and values.

Q3. The entire group of individuals we want information about is called the _____.

A3. Population

Q4. The subset of the population we actually examine in order to gather information is called the ______. 

A4. Sample.

Q5. Studying the whole population by attempting to contact every individual is called conducting a ______.

A5. Census.

Q6. Studying a population by taking a subset of it in order to generalize to the whole population is called _____.

A6. Sampling.

Q7. The method used for selecting the sample from the population is called the ____ of sampling.

A7. Design.

Q8. If a radio station invites anyone who wants to call and give an opinion on a question, the set of people thus obtained is called a _____ response sample.

A8. Voluntary.

Q9. If the researcher enrolls a group of people in the study on the basis of how easy it is to contact them and get them to enroll, that method of sampling is called ______ sampling.

A9. Convenience.

Q10. The systematic error introduced when the sample is very different from the population is called ____.

A10. Bias.

Q11. If a conservative radio commentator polls his listeners, and a liberal commentator polls her listeners, both polls are likely to be biased as methods of ascertaining the sentiment of the country, because _______. 

A11. It’s very likely that these samples differ highly from the country as a whole.

Q12. A SRS, or simple random sample, is a subset of n individuals from a population, chosen in such a way that ____.

A12. Every subset of n individuals has an equal chance of being chosen for the sample.

Q13. True or false: if every individual in the population has an equal chance of being included in the sample, the sample is a simple random sample.

A13. False. You need not only this condition, but also that every subset of the population of size n is equally likely to be chosen.

Q14. Suppose I take the numbers 1, 2, 3, and 4, and write them on identical pieces of paper, put them into a hat and mix them thoroughly, and draw out two numbers. Is this a simple random sample of the 4 numbers?

A14. Yes.

Q15. Suppose I take the numbers 1, 2, 3, and 4. First I take the numbers 1 and 2 and put them into a hat, and choose one of them. Then I take the numbers 3 and 4 and put them into a hat and choose one of them. For each of the numbers 1, 2, 3, and 4, what is the probability that this number will end up in the sample?

A15. 0.5

Q16. Is it possible that the subset {1,2} would be chosen for our sample using the sampling method just mentioned (that is, pick randomly from 1 and 2, then pick randomly from 3 and 4)?

A16. No.

Q17. So the sampling method just mentioned is one where each individual has equal probability of being chosen, but each subset is not equally likely to be chosen; thus the sample obtained is, or is not, a simple random sample?

A17. Is not. 

Q18. In a table of random digits, each triple of digits is equally likely to be any of the ____ possibilities from _____ to ______. 

A18. 1000,  000,  999

Q19. The two rhyming words (with different ways of spelling the second syllable) that summarize the process of using a table of random digits to select a simple random sample are ___ and ____. 

A19. Label and table.

Q20. There are 7 members in a class. Please describe how you would use a table of random digits to select a simple random sample of 3 of them.

A20. Assign each of them a single digit label. Enter the random number table at any point, and look at the numbers in order. If the number isn’t one of the ones you assigned, ignore it and go to the next. If it is one that you assigned, put that individual in the sample. Keep going until you have put 3 individuals in the sample.

Q21. A sample chosen by chance is called a ____ sample. A21. Probability

Q22. Suppose there is a class, and someone wants to choose a random sample of it. But the researcher wants to make sure that both males and females are adequately sampled. So the researcher takes the names of the girls, and draws a simple random sample of them, and then does the same with the boys’ names. The total sample thus obtained is not a simple random sample, but a _____ random sample.

A22. Stratified

Q23. Suppose a researcher wants to collect a random sample of high school students in the U.S. The researcher first takes a simple random sample of counties in the country, then takes a simple random sample of high schools within each county, and then a simple random sample of students within each high school. This sampling method is called ____.

A23. Multistage sampling.

Q24. The above method of sampling high school students leaves out homeschoolers. The general term for such a problem in sampling is ____.

A24. Undercoverage.

Q25. When you get a survey in the mail and immediately toss it in the trash, the source of bias this introduces into the survey is called _____.

A25. Nonresponse.

Q26. If you were asked what is the “essential principle of statistical sampling,” would you say that it’s to have a simple random sample, a probability sample, a stratified sample, or a multistage sample?

A26.  A probability sample, because the most essential factor is that the sample be chosen by chance.

Q27. During recent decades, society has become less and less tolerant of any sexual activity between therapists and their clients. Surveys of the incidence of such behavior are now almost impossible to obtain, because therapists would avoid trusting a researcher with a confession of behavior that would lead to severe penalties. This introduces bias into any survey that is called ______ bias.

A27. Response

Q28. One survey question asks, “Do you believe that children should be legally protected from exposure to violent models on TV that can lead them to commit acts of violence?” And a separate question asks, “Do you believe that government should limit the free expression of ideas by censoring television?” The major difference in results these questions would yield would be referred to as ______ effects.

A28. Wording.

Q29. Which would give more accurate results in a poll: a probability sample of 1000 people, or a voluntary response sample of 100,000 people?

A29. The probability sample of 1000.

Q30. A study in which we actually do something to people, animals, or objects in order to learn about the response is called an _____.

A30. Experiment.

Q31. The individuals on which an experiment is done are called the experimental whats?

A31. Units.

Q32. When the experimental units are human beings, according to our book they are called ____, (although the preferred term among psychological researchers these days is “participants.”)

A32. Subjects.

Q33. The thing that is done to the subjects (or participants) (for example giving them a drug or teaching them to read) is called a ____.

A33. Treatment.

Q34. Suppose that in an experiment, learning of math facts is your response variable. You are studying two explanatory variables, and varying them systematically in your study: amount of practice, and the frequency of recurrence of any one math fact in a practice session. These two explanatory variables are called the two _____ in the experiment.

A34. Factors.

Q35. In an experiment on math facts, one is studying the frequency of recurrence of any one math fact in practice sessions: does, for example, 7+8 occur every 3 problems, every 20 problems, or every 90 problems? If the experiment is set up like this, within the factor called “frequency of recurrence” there are three different degrees of that factor, three different specific values of that factor, which in the jargon is called three ____ of that factor.

A35. Levels.

Q36.  A pill that is made of inactive material, which is used so that subjects can have  information withheld about which treatment group they are in, is an example of a _____.

A36. Placebo.

Q37. If you want evidence for causation, and if you want to study the interactions of factors, and you are able to do either an observational study or an experiment, you should, all other things equal, choose the ______. 

A37. Experiment. 

Q38. When people get better from an inactive treatment, that is called the ____ effect.

A38. Placebo.

Q39. A group of individuals who receive an inactive treatment, so that the effects of a possibly active treatment can be contrasted with those of inactive treatment, is called a ____ group. 

A39. Control. 

Q40. A researcher tries to make two treatment groups equal on every variable other than the treatment of interest. The researcher does this by fashioning two groups that are very similar on several variables relevant to outcome. What is the problem with this method?

A40. There are too many lurking variables – the experimenter may not measure all of them, and some of them may not become apparent until after the experiment. Some of them may not be measurable at all.

Q41. What’s the “gold standard” method of assuring the equivalence of two treatment groups? 

A41. To randomly assign subjects to treatment groups.

Q42. Suppose you first assemble pairs of subjects that are very similar on the preintervention measure of the response variable. Then, you randomly assign one member of each pair to the experimental group or the control group. This method combines random assignment with ______.

A42. Matching.

Q43. A researcher is studying the effect of two methods of teaching reading. Instead of matching the subjects on their reading level and then randomly assigning one of each pair to the two groups, the researcher ignores the initial reading level for purposes of assignment to groups, and instead picks a simple random sample of the whole set of subjects to be in each group. Is this an acceptable method of assignment to groups?

A43. Yes. Understanding why this is true is central to the logic of experimental design.

Q44. True or false. When subjects are assigned at random to two groups, get two treatments, and they differ on the response variable, it must be true that the treatment accounts for the difference between the groups.

A44. False. It is also possible that the “play of chance in the random assignment” accounts for the difference in the groups.  (However, the likelihood of this alternative explanation can be quantified, and when it is small enough, the other explanation is favored.)

Q45. Suppose that both treatments studied in an experiment in fact have no causal influence upon the response variable. Under what conditions are we more likely to see big differences between the two groups, due to the vagaries of random assignment: with a very small sample size, or a very big sample size?

A45. A very small sample size.

Q46. A difference between groups that is so large (and with so many subjects) that it would “rarely” (i.e. to whatever criterion of rarity we specify) occur by chance is called a _____  ______  effect.

A46. Statistically significant.

Q47. The three central principles of experimental design are _____ (which is making comparisons between groups), _______ (a method of assigning individuals to groups), and _______ (which has to do with how many individuals you have in your groups). 

A47. Control, random assignment, and replication. (The word replication here refers to repeating the observation on more subjects within a given experiment. The word is also used, in a different sense, to refer to repeating the experiment.)

Q48. Suppose that we want to study the effect of a new curriculum and an old one, on reading skills. We also want to study the effects of whether the curriculum is delivered in person or over the phone. We randomly assign subjects to the curriculum, but we can’t randomly assign them to in person or over the phone, because certain people live too far away to get the training in person. So subjects are allocated to the curriculum at random, but not to the delivery method. We would say that the experimental design here is not _____  ________. 

A48. Completely randomized.

Q49. In a “double-blind” experiment, what two sets of people are “blind” to which group the subject is in?

A49. The subjects themselves, and the research staff who have contact with them.

Q50.  What’s a problem in making inferences from experiments that is often less of a problem in observational studies? 

A50. The problem of “lack of realism,” in other words, the problem that the conditions in the study do not match those to which we wish to generalize. 

Q51. How do you do random assignment in a matched pairs design? A51. First choose pairs that are as similar as possible, then randomly choose one subject from each pair.

Q52. In a certain type of matched pair design where each subject serves as his or her own control, and each “pair” consists of only one individual, what is randomly assigned?

A52. Whether the subject gets treatment 1 first or treatment 2 first.

Q53. Suppose that we want to compare two method of tutoring in reading that children receive after school. We know that the teacher the student has is also an important variable in the outcome variable, which is reading skill. Please describe how we would use a block design to control the effect of teacher when studying the method of teaching reading.

A53. We randomly assign the students of the first teacher to the two groups, then do the same for the students of the second teacher, and so forth, rather than using a simple random sample of all students.

Q54. True or false: If we want to make separate conclusions about males and females in a study, it’s a good idea to block on gender when making our assignment to groups.

A54. True.

Q55. Making a model that accurately reflects the experiment under consideration and imitating chance behavior based on that model is called doing a _____.

A55. Simulation.

Q56. What are the 5 steps of doing simulations? A56. State problem, state assumptions, assign digits to represent outcomes, simulate repetitions, state conclusions.

Q57. Someone is wanting to simulate a situation where there’s a 3/10 chance that a child will be involved in bullying. The person assigns the digits 0 to 3 for involved in bullying, and the rest of the digits to noninvolved in bullying. Do you have a problem with this? If so, what’s your problem?

A57. Yes a problem. There are 4 digits from 0 to 3 inclusive, and 6 other digits, so the person would be simulating a 40% probability situation rather than a 30% probability.

Q58. Please use your calculator to generate 4 random integers in the range from 0 to 99. Please tell what you entered on your calculator to get these, and what 4 integers you got.

A58. On the TI 83 or 84, you do math>prb>5:randint, (0, 99, 4). On the TI 89, you do catalog, F3, and then scroll down to randInt and hit enter. Then you insert 0, 99, 4 in the parentheses. You’ll get different sets of numbers each time, unless something very unlikely happens! 

TPS Chapter 6 Probability  Q1.The branch of mathematics that deals with the pattern of chance outcomes is ____.

A1. Probability

Q2. The big idea of the study of probability is that chance behavior is unpredictable in the _____ but has a regular and predictable pattern in the _____.

A2. short run, long run

Q3. An illustration of the “big idea” mentioned in Q2 is that while it is unpredictable whether a single coin toss will come out heads, the ________ is almost always very close to .5.

A3. Fraction of heads in a very large number of tosses

Q4. What is the difference between a changing, or variable phenomenon that is “random” and one that is not?

A4. A random phenomenon is uncertain with respect to individual outcomes, but nonetheless there is a regular distribution of outcomes in a large number of repetitions.

Q5. The ____ of any outcome of a random phenomenon is the proportion of times the outcome would occur in a very long series of repetitions, i.e. long-term relative frequency.

A5. Probability.

Q6. When there are independent trials, that means that the outcome of one trial _______.

A6. Does not influence the outcome of another.

Q7. The set of all possible outcomes of a random phenomenon is called the ______.

A7. Sample space.

Q8. An event is defined as a subset of ____. A8. The sample space.Q9. When we make a mathematical description of a random phenomenon by describing a sample space and a way of assigning probabilities to events, we are constructing a

A9. Probability model.

Q10. Jane has 2 shirts and 3 pairs of pants. If we want to picture the 6 ways she can dress in these garments, we can draw a diagram with a bifurcation point at the left of the page, with two lines going out to two points called “red shirt” and “brown shirt.” From each of these, you then draw 3 lines, saying “blue pants,” “green pants,” and “black pants.” This sort of picture is called a _____. 

A10. Tree diagram.

Q11. Jane has 2 shirts and 3 pairs of pants. The “Cartesian Product” of these two sets produces 6 possible combinations. This illustrates what our book calls the _____ principle, which says that if you can do one task in a ways, and another in b ways, you can do both together in _____  ways.

A11. multiplication,  ab

Q12. Please give an example of sampling with and without replacement. A12. As one of many possible examples: in sampling without replacement, you draw first one, then another card from a deck without putting the first card back. In sampling with replacement, you draw one card from the deck, note its identity, replace it, shuffle them, draw again, and note the identity of the second draw. 

Q13. The probability of any event A has to satisfy the inequality  x<= P(A) <=y.  What are x and y?

A13. 0 and 1.

Q14. If you sum the probabilities for each member of the sample space, you always come to a grand total of ____.

A14. 1.

Q15. If the probability that A will occur is P(A), the probability that A will not occur is ____.

A15. 1-P(A).

Q16. What notation do we use to represent the probability that A will not occur, if P(A) is the probability that A will occur?

A16. P(Ac) 

Q17. What does it mean when we call two events disjoint or mutually exclusive?

A17. That if one happens, the other can’t also happen. 

Q18. If two events are mutually exclusive, and P(A) is the probability of A, and P(B) is the probability of B, what is the probability of (A and B) (a.k.a. A intersect B)?

A18. 0

Q19. If two events are mutually exclusive, and P(A) is the probability of A, and P(B) is the probability of B, what is the probability of (A or B) (a.k.a. A union B)?

A19. P(A) + P(B). 

Q20. Suppose a random event has k equally likely outcomes. What’s the probability of any one of these outcomes?

A20.  1/k.

Q21. When there are k equally likely outcomes, and a certain number of outcomes are counted as being in event A, the P(A) = (number of outcomes in A)/k.  Please give an example of this.

A21. As one possible example: when a deck of cards has 52 cards, each of which is equally likely to be drawn, and 13 of these are called hearts, then P(drawing a heart) = number of hearts/number of cards, or 13/52, or ¼.

Q22. In a finite sample space, with outcomes that are not equally likely, the probability of any event is the sum of the probabilities of the outcomes making up the event. Someone’s statistics instructor asks, “Please give an example of this.” The person says, “Suppose the probability that a randomly selected person in a certain community will own a dog is .3, a cat is .2, and another pet is .1. Then the probability that the person will own a pet is (.3 +.2 +.1). Do you have a problem with this example? If so, what’s your problem?

A22. One big problem is: what about the people who own two or more different types of pets? If the only people who owned cats or other pets were the dog owners, the probability of pet ownership would be .3 instead of .6.  The addition rule only applies to disjoint events, and these categories are not disjoint.

Q23. When two events are independent, the probability that both will occur?

A23. The product of their individual probabilities.

Q24. How would you use the multiplication rule for independent events, as well as the complement rule, to calculate the probability that in two coin tosses, you would get at least 1 tail? Please explain your reasoning as you compute an answer.

A24. Your strategy is to realize that (at least one is tails) is the complement of the event (both are heads). So you use the multiplication rule to compute the probability of two heads (1/2 * ½ = ¼) and then compute the complement of that  (1-1/4) to get ¾. 

Q25. Suppose you have a lab test that gets a “false positive” 1 in 100 times. (That means that the P is 1/100 that the test will say you have the disease, when you don’t.) How would you use the multiplication rule as well as the complement rule to compute the probability that out of 100 tests where the correct answer is “negative,” you will get at least one false positive? Please explain your reasoning as you compute an answer.

A25. The logic for this is exactly the same as for Q24. You realize that the event (at least one false positive in 100 tests) is the complement of (no false positives in 100 tests). The probability of not getting a false positive on one test is .99, and each test is assumed independent of the others, so .99100

(or about .37) is the probability that all the tests will be free of false positive results. The complement of this, which is the probability that at least one is a false positive, is about .63. 

Q26.  What is the general addition rule for unions of two events that may or may not be disjoint?

A26. P(A or B) = P(A) +P(B) –P(A and B). 

Q27. The probability that someone plays the guitar, in a certain community, is .4, and the probability that the person plays the harmonica is .2.  The probability that the person plays both guitar and harmonica is .1. What’s the probability that the person plays either guitar or harmonica? 

A27. .4+.2-.1=.5

Q28. In a certain community, .7 of people take physics and .6 of people take statistics. The fraction of people taking one or the other is .85. What’s the probability that a randomly drawn person will take both courses? 

A28. P(A or B) = P(A) + P(B) - P(A and B); therefore .85=.7+.6 –P(A and B), from which P(A and B) =.45. 

Q29. What does the symbol P(B|A) mean? A29. The probability of B given A, or the probability that B will happen given that A has happened.

Q30. What is the general multiplication rule for any two events? A30. P(A and B) = P(A)P(B|A)Q31. Why does this rule simplify to the multiplication rule for independent events, when the two events are independent?

A31. Because the definition of independence is that P(B|A) =P(B), another way of saying that A doesn’t influence the occurrence of B. Thus when the two events are independent, we can substitute P(B) for P(B|A) in the general multiplication rule, and get P(A and B)= P(A)P(B).

Q32. The general multiplication rule for any two events really follows from the definition of conditional probability, and is a rearrangement of the defining formula. What is that defining formula for conditional probability?

A32. P(B|A) = P(A and B) /P(A)

Q33. True or false: P(A and B and C) = P(A)P(B|A)P(C|A and B)? A33. True.Q34. True or false: the fact that P(A and B and C)=P(A)P(B|A)P(C|A and B) is the basis of tree diagrams.

A34. True.

Q35. If there’s a .05 chance that a male high school athlete will play in college, and a .017 chance that a college athlete will play professionally, what’s the chance that a male high school athlete will both go to college and play professionally?

A35. .05 * .017, or .00085. 

Q36. If there’s a .95 chance that a male high school athlete will not play in college, and a .0001 chance that someone not playing in college will play professionally, what’s the chance that someone will both not play in college and play professionally?

A36. .95 * .0001, or .000095.

Q37. If the chance is .00085 that a male high school athlete will play in college and play professionally, and the chance is .000095 that an athlete will not play in college and play professionally, what’s the chance that a male high school athlete will play professionally?

A37. .00085+.000095, or .000945, which is a little less than one in a thousand.

Q38. For a tree diagram: the probability of reaching the end of any complete branch is the product of what?

A38. The probabilities written on its segments (provided that those probabilities are written so as to mean the conditional probability of going down this branch given that you have reached the previous bifurcation!)

Q39. When you look at Bayes’s rule, (page 375) does the numerator of the fraction, P(B|A)P(A) equal the probability of A and B? 

A39. Yes.

Q40. When you look at Bayes’s rule, (page 375) does the denominator of the fraction reduce to the probability of B and A plus the probability of B and Ac, which reduces to the probability of B?

A40. Yes.

Q41. Is a simpler form of Bayes’s rule the following: P(A|B) = P(B|A)P(A)/P(B) ? 

A41. Yes.

Q42. Does this simpler form of Bayes’s rule say that if you want to “reverse the direction of a conditional probability,” i.e. to find P(A|B) when you know P(B|A), you multiply the P(B|A) by the ratio of the P(A) to the P(B)?

A42. Yes.

Q43. P(A|B)=P(B|A)P(A)/P(B) implies that if you know the probability of a positive test given having a disease, you can calculate the probability of having the disease given a positive test, provided that you also know what two other quantities?

A43. The probability (or rate) of the disease, and the probability (or rate) of positive tests. So for example if the probability of a positive test given the disease was .99, but the rate of positive tests was 1 in 100, but the rate of the disease was 1 in 1000, the probability of the disease given a positive test would be only .99*.001/.01, or .099. This illustrates that the probability of having the disease given a positive test does not necessarily equal the probability of having a positive test given the disease! 

Q44. What should someone do to organize the information for decision analysis if one is given a set of complex information about branching alternatives and outcomes given those alternatives?

A44. Use a tree diagram.


Recommended