+ All Categories
Home > Documents > introstatsA_Ch1(1).pdf

introstatsA_Ch1(1).pdf

Date post: 29-Sep-2015
Category:
Upload: dracoauric
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
55
F77SA1 Introduction to Statistical Science A Lecture notes Jennie Hansen George Streftaris
Transcript
  • F77SA1 Introduction to Statistical Science ALecture notes

    Jennie HansenGeorge Streftaris

  • 2

  • Contents

    Introduction iii

    1 Collecting Data 11.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . 231.3 Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 401.4 Looking at data intelligently . . . . . . . . . . . . . . . . . . . 45

    i

  • ii CONTENTS

  • Introduction

    The use of statistical arguments and the conclusions of statistical analyses areused to guide and inform many decisions in society. Some examples include:

    Global warming: The scientific claims both for and against globalwarming are often based on statistical analyses - so how can we judgethe validity of the underlying statistical analyses on both sides of theargument?

    Clinical research: NICE (National Institute for Health and ClinicalExcellence) recommends whether new drugs should be made availableon the NHS. Their recommendations are based on statistical analysisof the effectiveness of the drug. How are the clinical experiments de-signed and the data collected in order prove/disprove clinical claims ofeffectiveness?

    Psychological research: You may have heard people say that womenare better at multi-tasking than men, whereas men are better at mapreading than women. What is the basis (if any!) of such claims? Howdid researchers design an experiment to investigate these differences be-tween men and women and how do they know that the differences thatwere observed between men and women were statistically significantand could be attributed to a difference in gender?

    Opinion polls (e.g. market research, television ratings, etc ): Manycorporate decisions are based on survey results. It is important to un-derstand what the percentages in such surveys mean, how they wereobtained, and how accurate the information is - otherwise costly mis-takes can be made.

    Life insurance and pensions: Life insurance premiums and pension cal-culations are based on estimates of life expectancy. These estimates

    iii

  • iv INTRODUCTION

    are based on statistical analyses of mortality data. Life insurance com-panies and pension providers would find it difficult to meet their obli-gations if the underlying statistical calculations are incorrect!

    In all of the above examples, the aim of the underlying statistical analysis isto provide insight by means of numbers. This process usually involves threestages:

    1. Collecting data

    2. Organising data

    3. Drawing conclusions from the data (inference)

    In this module, we look at the statistical principles and techniques used ateach of these three stages in an analysis. At the end of this module studentsshould be able to

    understand the logic of statistical reasoning understand how statisticians come to their conclusions be able to evaluate the use of statistical methods in a variety of appli-

    cations (e.g. science, finance, the media, etc)

    Developing these skills is an important step in the development of the criticalthinking skills which are required in every aspect of professional life.

  • Chapter 1

    Collecting Data

    1.1 Sampling

    Samuel Johnson is reported to have said, You dont have to eat the wholeox to know that the meat is tough. This is one way of describing the ideabehind sampling. Sampling is a way to gain information about the whole byexamining only a part of the whole. Sampling is widely used by industry,social scientists, political parties, the media, etc.

    Example 1.1.1 When a newspaper reports that 34% of the Scottish elec-torate support independence for Scotland, someone, somewhere, has to haveasked some voters (but clearly not every voter) their opinion. The reportedpercentage is based on the sample of voters that were questioned. This is anexample of sampling in order to obtain information about the whole.

    Whenever statisticians discuss sampling they use certain precise terms todescribe the procedure of sampling:

    Population - this is the entire group of objects about which information issought.

    Unit - any individual member of the population.Sample- a subset of the population which is used to gain information aboutthe population.Sampling frame - this is the list or group of units from which the sampleis chosen.Variable - a characteristic of a unit which is to be measured for those unitsin the sample.

    1

  • 2 CHAPTER 1. COLLECTING DATA

    DiscussionIn the above example, the population is the entire Scottish electorate. Thesampling frame is not usually the same as the population. For example, ifthe sample was chosen by selecting a subset from the electoral roll, thenthe sampling frame would be the electoral roll - however, the electoral rolldoes not necessarily contain the name of every eligible Scottish voter. In theexample the variable to be measured is opinion on Scottish independence,e.g. support/dont support independence.

    Example 1.1.2 Acceptance sampling is used by purchasers to check thequality of the items in a large shipment. In practice, manufacturers oftenincorporate acceptance-sampling procedures in their contracts with suppliers.The purchaser and supplier agree that a sample of parts from a large shipmentwill be selected and inspected and based on the number of parts in the samplewhich meet the purchasers specifications, the shipment is either accepted ofrejected.

    DiscussionIn the above example, the population and the sampling frame are the same,i.e. the shipment of items, and the variable measured is whether or not apart is defective.

    Sampling design

    We have seen that we start with a question about a population and thentake a sample from the population in order to answer our question about thepopulation. This approach will give us a meaningful answer provided we canbe confident that the sample is (roughly) representative of the population.

    Problem: How should we go about selecting a (representative) sample fromthe population?

    One possibility is to select a sample consisting of the entire population - thisis what happens when the government conducts a census. In this case wewould obtain the exact answer to our question. It might seem that thiswould be the ideal approach, but it is usually problematic to sample theentire population. Some problems with this approach include:

    expensive, time-consuming, problems with acceptance sampling if units are destroyed as part of the

    sampling procedure (e.g. testing the tolerance of fuses in a sample).

  • 1.1. SAMPLING 3

    Alternatively, we could select a smaller sample.

    Smaller samples - a potential pitfall - Selecting a (relatively) smallsubset from the population can lead to errors due to the fact that our methodof selecting the sample may have a tendency to select an unrepresentativesample from the population.

    Example 1.1.3 Convenience samplingSelection of whichever units of the population are easily accessible is calledconvenience sampling. For example, companies sometimes seek informationfrom consumers by hiring interviewers to stop and interview shoppers onPrinces Street in Edinburgh. But a sample of shoppers on Princes Street maynot be representative of the entire population of consumers. For example, carowners may prefer shopping at out-of-town shopping centres, so any sampleof shoppers on Princes Street would under-represent these consumers andover-represent others such as elderly consumers or inner city residents.

    The example above is one illustration of bias in sampling:A sampling method is biased if the results from the samples are consistentlyand repeatedly different, in the same way, from the truth about the population(i.e.there is a tendency to always over-estimate or always underestimate).

    Example 1.1.4 Voluntary response samplingA voluntary response sample chooses itself by, for example, responding toquestions asked by post or during radio or television broadcasts. People whohave strong opinions about an issue are more likely to take the trouble torespond and their opinions may not represent the population as a whole.

    Simple random sampling

    We saw in the discussion above that sometimes we can detect when a sam-pling method is biased. In this section we look at how to develop a methodof sampling which is unbiased. In order to develop an unbiased method ofchoosing a sample, we should think a bit about why some of the samplingmethods described in the examples in the last section were biased.

    Observation: In the two examples above, the problem with the samplingmethod was that certain subsets from the sampling frame were more likelyto be selected than other subsets. So a first step in developing an unbiasedsampling method is to select a sample of n units from the sampling frame in

  • 4 CHAPTER 1. COLLECTING DATA

    such a way that no subset of n units is more likely to be selected then anyother.

    A simple random sample (SRS) of size n is a sample chosen in such away that every collection of n units from the sampling frame has the samechance of being selected.

    RemarkBy taking a simple random sample, we can avoid some of the problemsassociated with convenience sampling because no part of the population islikely to be over (or under) represented in a simple random sample.

    Question: How can we select a simple random sample?

    One method is to use physical mixing. Physical mixing is the method thatis used to select the lottery numbers each week. The lottery draw works asfollows:

    Start with 49 identical balls and label the balls with the numbers 1 to49. Then thoroughly mix the balls and select a ball at random from the49 balls (i.e. choose it mechanically or blindly from the 49 balls).Key point: at this first stage, every ball has the same chance to beselected!

    After selecting the first ball, the remaining balls are thoroughly mixedand a second ball is selected at random from the 48 balls.Note: if the mixing at the second stage is thorough, then any of theremaining 48 balls has the same chance of being selected. So, after twostages in the draw, we have a simple random sample of size 2.

    This procedure is repeated 4 more times to obtain a total of 6 balls andbecause at each stage any of the remaining balls is equally likely to beselected, the resulting sample of 6 balls is a simple random sample ofsize 6 from the original 49 balls, and the numbers on the selected ballscorrespond to a simple random sample from the numbers 1,2,..., 49.

    RemarksSelecting a random sample using physical mixing is not as easy as it mightappear. One key feature of this approach is that at each stage we must beable to thoroughly mix the set of objects from which we are selecting a randomobject. In the case of the lottery, much effort goes into verifying that themachine that mixes the balls does, in fact, thoroughly mix the balls before

  • 1.1. SAMPLING 5

    each draw and that there is no bias in the way that the machine selectseach ball (just imagine the uproar if the mixing stage of the procedure wasfound to be biased in some way!)

    In other situations it can be more difficult to guarantee that the objectshave been thoroughly mixed. For example, when playing card games, some-one usually shues the deck before dealing the cards to the players. Thepurpose of shuing the deck is to physically mix the deck in such a way thateach player has the same chance of getting any particular card in their hand.However, does shuing a deck really thoroughly mix the deck?! Also, howmany times should a fresh deck be shued in order to ensure that it hasbeen thoroughly mixed? Casinos are interested in the answers to these ques-tions and have even commissioned statisticians to work out the answers! Itturns out that a fresh deck should be shued 7 times in order to thoroughlymix the deck (see http://en.wikipedia.org/wiki/Persi_Diaconis andhttp://homepage.mac.com/bridgeguys/SGlossary/ShuffleofCards.html).

    Another problem with using physically mixing to select a simple randomsample is that the sample frame may be too large for this method to bepractical. For example, suppose the Admissions Office at Heriot-Watt wouldlike to select a sample of 50 first-year students to interview regarding theirreasons for choosing to go to university. It wouldnt be practical to get severalhundred identical balls, put student names on the balls, mix the balls, andselect a sample of 50 balls in the same way that the lottery balls are drawn!Even if we tried to do this (and could find a container big enough to hold allthe balls), there would still be the problem of making sure that the balls werethoroughly mixed before each draw. It was exactly these sorts of difficultiesthat were encountered when the US army conducted the first lottery in 1970to determine who would be drafted into the army. The order in which menbetween 19 and 25 were to be drafted was determined by drawing capsuleswhich contained birthdays out of a box - men with the first birthday tobe drawn would be drafted first, then the men with the next birthday, etc.However, because of a flawed mixing process, in the 1970 lottery it turnedout that capsules containing birthdays later in the year were more likely tobe drawn than capsules containing birthdays in January!

    Question: So, if it is difficult and sometimes not practical to use physicalmixing to select a simple random sample, what can we do instead?

    Answer: One way to select a simple random sample (even from a very largesampling frame) is to use a table of random digits.

  • 6 CHAPTER 1. COLLECTING DATA

    A table of random digits is a list of the 10 digits, 0,1,2,3,4,5,6,7,8, and 9having the properties:

    1. The digit in any position in the list has the same chance of being anyone of the digits 0,1,2,3,4,5,6,7,8, or 9.

    2. The digits in different positions are independent of each other in thesense that if I know which digit appears in one position in the table,then it is still the case that the digit in any other position in the tablehas the same chance of being any one of 0,1,2,3,4,5,6,7,8,or 9.

    A great deal of mathematical ingenuity and computing expertise has beendevoted to generating tables of random digits (they are also important inapplications of cryptology, computer security, etc). However, for this coursewe do not need to consider how such lists are compiled. Instead, it is enoughto know how to use such a table. To use a table we, need to know thefollowing consequences of properties 1 and 2 above:

    Any pair of digits in the table has the same chance of being any of the100 possible pairs: 00, 01, 02, 03,...., 98, 99.Also, if I know the digits in two adjacent positions in the table, thenthe digits in any other two adjacent positions in the table have thesame chance of being any one of the 100 possible pairs provided neitherposition in the pair overlaps with the first pair..

    Any triple of digits in the table has the same chance of being any ofthe 1000 possible triples: 000, 001, 002, 003,...., 998, 999.Also, if I know the digits in three adjacent positions in the table, thenthe digits in any other three adjacent positions in the table have thesame chance of being any one of the 1000 possible triples provided noneof positions in the triple overlaps with the first triple.

    The same principles hold for groups of four or more digits from thetable.

    To illustrate how to use a table of random digits, we will do some examplesin class.

    Getting information about the population from the sample

    So far we have looked at how to pick the sample (i.e. subset) from a pop-ulation in such a way that we havent shown any favouritism. However,

  • 1.1. SAMPLING 7

    once we have selected a simple random sample from the population, thereare some questions that we need to resolve:

    How can we justify using information from the sample to tell us some-thing about the population - especially if the sample is only a smallsubset of the population?

    How does information from the sample tell us something about thepopulation?

    To begin thinking about these questions we need to introduce some moreterminology.

    A parameter is a numerical characteristic of the population. It is afixed number, but we (usually) do not know its value.

    A statistic is a numerical characteristic of the sample. The value ofthe statistic is known once we have taken a sample, but its valuechanges from sample to sample.

    Typically, when we want to know something about a population, our questionabout the population can be expressed in terms of an unknown parameter.After taking a sample from the population, we compute the value of anassociated statistic and use the value of this statistic to estimate the valueof the unknown population parameter.

    Example 1.1.5 In the 1980s Newsday (an American weekly news maga-zine) surveyed a random sample of 1373 parents. The magazine wanted todetermine the proportion, p, of parents in the American population who, ifgiven the choice, would have children again. In the sample, 1249 parents saidthat they would, if given the choice, have children again. So, the proportionin sample was p = 1249

    1373 0.91. Note: The fraction p which was com-

    puted from the sample is a statistic and we use it to estimate the populationparameter p.

    Now suppose Newsday selected another random sample of size 1373. Wewould not expect the number in the second sample who would have childrenagain to be exactly the same as the number in the first sample, but wemight still expect the proportion in the sample to be close to 0.91, theproportion in the first sample. In fact, we intuitively expect that if we were

  • 8 CHAPTER 1. COLLECTING DATA

    to repeatedly select random samples of size 1343 from the population ofparents, the corresponding sample proportions would be clustered together.The way to visualize this clustering is to make a bar graph which reflectsthe pattern of the values of the sample proportion p when we repeat samplingseveral times. To illustrate this, we will think about a simple example (i.e.we will do a statistical thought experiment).

    Note! Our observations from this simple example will also apply to morecomplicated practical situations in statistics.

    Example 1.1.6 Suppose that I have a (large) box which contains 5000beads, all of which are identical except for their colour. Of the 5000 beads,1000 beads are black and 4000 beads are white and suppose that you cantsee the beads in the box. You would like to determine the proportion, p, ofblack beads in the box. To do this, you follow the following procedure:

    You select a simple random sample of size 25 from the balls in the box.

    You count the number of black beads in your sample and compute theproportion, p, of black beads in your sample. This proportion is yourestimate of the proportion of black beads in the box.

    Discussion

    1. Although this example is somewhat artificial, it gives us a simple modelfor many situations (including the Example 1.1.5 above) - e.g. wheneverwe wish to determine what proportion of a population would answerYes to a question, we can model the population as beads where black Yes and white No.

    2. The proportion, p, of black beads in the sample may not be equal top = 0.2, the proportion of black beads in the population, but we wouldexpect p to be close to 0.2.

    3. If we return the beads in the sample to the box, thoroughly mix thebeads in the box, and select another simple random sample, the pro-portion of black beads in the second sample may not be the same asthe proportion of black beads in the first sample (but we would stillexpect it to be close to the population proportion p = 0.2).

  • 1.1. SAMPLING 9

    4. The proportion of black beads in the entire population is the populationparameter and the proportion of black beads in the sample is a samplestatistic .

    Now to see how the values of the sample proportions cluster together, wewill repeat this sampling procedure several times and look at the pattern ofthe corresponding sample proportions. This pattern is called the samplingdistribution of the sample proportion p.

    It would be very tedious to actually repeat this experiment several times(not to mention the fact that I would need to start with box containing 5000beads!). Fortunately, we can use a computer to do this experiment instead.The results of 200 repetitions of sampling 25 beads from 5000 in the aresummarised in the table and bar graph below.

    No. black beads 0 1 2 3 4 5 6 7 8 9 10 11Proportion 0.00 0.04 0.08 0.12 0.16 0.20 0.24 0.28 0.32 0.36 0.40 0.44Frequency 2 5 17 25 35 38 31 23 12 8 2 2

    0.0 0.2 0.4 0.6 0.8 1.0

    010

    20

    30

    Proportion

    Sampling distribution for sample proportions: sample size 25 (200 repetitions)

  • 10 CHAPTER 1. COLLECTING DATA

    Discussion

    1. The bar graph above (which describes the sampling distribution) showsthat the values of the sample proportions obtained from the 200 repeti-tions are (more or less) symmetrically clustered around the populationparameter p = 0.2. This symmetry arises because taking a simple ran-dom sample is an unbiased sampling procedure. In this example, weknow the value of the population parameter p, but even if p were un-known, it would still be true that the sample proportions from repeatedsampling would be clustered around the population parameter p.

    2. On the other hand, if our sampling procedure is biased, the values of thesampling proportions will tend to be clustered on one side or the otherof the population parameter p and bulge in the sampling distributionwill be on one side or the other of the population parameter p. This isbecause bias in the sampling method means that the sampling statistictends to either overestimate or underestimate the population parameterwhen we repeatedly sample from the population.

    3. The spread of the values of the sample proportions gives an indicationof the precision of the sampling method. We will always see somespread in the values taken by a sample statistic when we repeatedlysample because there is sampling variability. Since we cannot eliminatesampling variability, our goal must be to try to reduce the spread inthe sampling distribution of our statistic (i.e. to increase the precisionof the sample statistic).

    Question: How can we improve the precision of a statistic obtained from asimple random sample?

    In this (and many other situations) the precision of a statistic which is basedon a simple random sample can be increased by increasing the size of thesimple random sample. To illustrate this, I used a computer to perform 200repetitions of sampling 100 beads from 5000 (sample size =100) and 200 rep-etitions of sampling 250 beads from 5000 (sample size =250). The samplingdistributions for the corresponding sample proportions are represented belowby bar graphs.

  • 1.1. SAMPLING 11

    0.0 0.2 0.4 0.6 0.8 1.0

    010

    2030

    40

    Proportion

    Sampling distribution for sample proportions: sample size 100 (200 repetitions)

    0.0 0.2 0.4 0.6 0.8 1.0

    010

    2030

    40

    Proportion

    Sampling distribution for sample proportions: sample size 250 (200 repetitions)

  • 12 CHAPTER 1. COLLECTING DATA

    DiscussionLets look a bit more closely at the bar graphs above which correspond tothe sampling distribution of the sample proportion p when the sample sizeequals 25, 100, and 250 respectively. We can see that as the sample sizeincreases, the values of p (obtained from repeated samples) become moretightly clustered around the population parameter p = 0.2:

    When the sample size is 25, then the values of p range between 0.0 and0.44. However, not many of the observed values were as far away fromthe true proportion p = 0.2 as either the value 0.0 or 0.44. In fact,194 of the values of p (97% of the 200 values) lie somewhere between0.04 and 0.36. This indicates that if we were to take another simplerandom sample of size 25 from the 5000 balls then there is a goodchance that the difference between the sample proportion p computedfrom our sample and the true proportion p = 0.2 is no more than 0.16.

    When the sample size is 100, then the values of p range between 0.08and 0.28. Again, not many of the observed values were as far awayfrom the true proportion p = 0.2 as either the value 0.08 or 0.28. Infact, 197 of the values of p (98.5% of the 200 values) lie somewherebetween 0.12 and 0.26. In this case, if we were to take another simplerandom sample of size 100 from the 5000 balls then there is a goodchance that the difference between the sample proportion p computedfrom our sample and the true proportion p = 0.2 is no more than 0.08.

    When the sample size is 250, then the values of p range between 0.14and 0.26. However, not many of the observed values were as far awayfrom the true proportion p = 0.2 as either the value 0.14 or 0.26. Infact, 194 of the values of p (97% of the 200 values) lie somewherebetween 0.15 and 0.25. In this case, if we were to take another simplerandom sample of size 250 from the 5000 balls then there is a goodchance that the difference between the sample proportion p computedfrom our sample and the true proportion p = 0.2 is no more than 0.05.

    So, by looking at the sampling distribution, we can work out the likelyrange of values for p, and we can see that this range of values is smaller thelarger then sample size. So, precision increases as the sample size increases.

    In the example above we knew the value of the population proportionp = 0.2, but our observation that the precision of the sample proportion p

  • 1.1. SAMPLING 13

    increases as the sample size increases holds no matter what the populationproportion p equals. In fact, statisticians have studied examples like ourmodel and have worked out the following rule of thumb for determining theprecision of p in terms of the sample size:

    Rule of thumb: Suppose that you select a simple random sample of sizen from from a (much larger) population, then there is a good chance thatthe magnitude of the difference between a sample proportion, p, and thepopulation parameter p is less than 1/

    n.

    One consequence of this rule of thumb:

    The precision of a sample statistic depends on the size of the samplebut not on the size of the population provided the population is muchlarger than the sample.

    For example, (provided the population is very large) the sample proportionp computed from a simple random sample of size 1000 from the populationis likely to differ from the (unknown!) population parameter by at most1/

    1000 0.03.Another consequence of the rule of thumb is that it allows us to deter-

    mine how big the sample must be in order to achieve a prescribed level ofprecision. For example, suppose a national radio station wishes to determinethe proportion p of the population that listen to their station at least onceduring a typical week.Question: Given that the station is able to select a simple random samplefrom the population of radio listeners, how big a sample should they selectin order to have a good chance that the sample proportion p differs fromthe population proportion by at most 0.02 (i.e there is at most a 2% error)?Answer: The statisticians Rule of Thumb says that we should choose thesample size n so that

    1n

    = 0.02.

    We can re-arrange this equation to get:

    n = 50 n = 2500.

    Note: Now that you have a rule of thumb for determining the precision ofa sample proportion you can look out for mistaken criticisms of statistics inthe media. For example, a journalist criticised the Newsday results about

  • 14 CHAPTER 1. COLLECTING DATA

    parenting (see Example 1.1.5 above) by saying that a random sample of size1373 was too small to give any meaningful information about a populationof several million. But our rule of thumb says that the sample proportion pcalculated from a simple random sample of size 1373 is likely to differ fromthe true proportion in the population by at most 1/

    1373 0.027!

    Summary:

    Despite the sampling variability of a statistic computed from a simplerandom sample (i.e. the value of the statistic varies from sample tosample), the values of the statistic have a sampling distribution whichcan be observed by looking at a frequency bar graph for the values ofthe statistic which are obtained by repeated sampling.

    When the sampling frame consists of the entire population (as it didin our example of sampling balls), then the values of the statistic com-puted from repeated simple random samples from the entire populationneither consistently overestimate nor consistently underestimate thevalue of the population parameter that we wish to estimate. In otherwords, simple random sampling produces unbiased estimates and thesampling distribution of the statistic bulges around the value of thepopulation parameter.

    The sampling distribution associated with a statistic computed from asample gives an indication of the precision of the statistic (i.e. we canget a rough idea from the sampling distribution of the magnitude ofthe typical difference between the value of the statistic computed fromthe sample and the value of the population parameter). The precisionof a statistic computed from a simple random sample depends on thesize of the sample and can be made as high as desired by taking a largeenough sample.

    Errors in sampling

    In the discussion above we saw that we can always expect that there will bea difference between the value of a sample statistic and the (unknown) pop-ulation parameter that we wish to determine. In Example 1.1.6 above, thediscrepancy between the sample proportion and the population proportionp = 0.2 was caused by chance in selecting the random sample (i.e. due to

  • 1.1. SAMPLING 15

    chance, we cant guarantee that the proportion of black beads in the samplewill be exactly the same as the proportion of black beads in the popula-tion). We also saw that for this very simple example, we could reduce thediscrepancy between the value of the sample proportion p and the popula-tion proportion p = 0.2 by increasing the sample size. Unfortunately, not alldiscrepancies between sample statistics and population parameters are theresult of chance errors (and which can be reduced by increasing sample size).In particular, whenever we are sampling from a human population there areother sources of error that we need to watch out for. Some of typical examplesof these are described below.

    Sampling errorsSampling errors are errors that arise from the act of taking a sampleand cause the sample statistic to differ from the population statistic. Sam-pling errors arise because the sample is a subset of the entire population.There are two types of sampling error:

    Random sampling errors are the deviations between the sample statis-tic and the population parameter which are caused by chance when weselect a random sample. The deviations between the sample propor-tions and the population proportion observed in the example above arerandom sampling errors.

    Nonrandom sampling errors arise from improper sampling methods.These errors can arise because the sampling method is inherently biased(e.g. convenience sampling). Nonrandom sampling errors can also arisewhen the sampling frame (the list from which the sample is drawn)differs systematically from the population.

    Example 1.1.7 Suppose that a polling organisation has been commissionedto determine the proportion of Edinburghs population who favour the intro-duction of a congestion charge. The polling organisation decides to use theEdinburgh telephone directory as a sampling frame (i.e. list) from which toselect a random sample to survey.

    Question: Will a sample chosen in this way be representative of the popu-lation (i.e. adults who live in Edinburgh)?

    Answer: The problem in this example is that using the telephone directorymeans people without landline phones (e.g. students and others who rely

  • 16 CHAPTER 1. COLLECTING DATA

    primarily on mobile phones and people who cant afford a telephone landline)and those who are ex-directory will not be part of any sample chosen usingthis method. The random sample selected by the polling organisation will berepresentative of the population of landline phone users but wont necessarilybe representive the population under investigation (i.e. adults who live inEdinburgh). If the views of the excluded members of the population differsignificantly from those who are listed in the telephone directory, then thesample statistic will be a biased estimate of the population parameter.

    Note: If the sampling frame differs systematically from the population, sam-ple statistics will be biased no matter how the sample is selected from thesampling frame. In other words, simple random sampling cannot give usunbiased statistics if the sampling frame differs systematically from the pop-ulation.

    Nonsampling errorsNonsampling errors are errors that are not related to the act of selectinga sample from the population. These errors can occur even if we usedthe entire population as our sample. Here are some typical sources ofnonsampling errors:

    Missing data: Sometimes information from a sample is incompletebecause it was not possible to contact some members of the sample orbecause some members of the sample refuse to respond. Even if theentire population was used as the sample, missing data could cause theresults of a survey to be biased if the people who cant be contacted orwho refuse to respond differ in some specific way from the populationas a whole.

    Response errors: Some members of a sample may give wrong an-swers when surveyed. For example, subjects may lie about their age,weight, income, use of alchohol, cigarettes and drugs, etc. Even sub-jects that are trying to answer a question truthfully may answer incor-rectly because they cant estimate very accurately exactly how manytimes they go up and down stairs in a day or how much tea they drinkin a day, etc. Other subjects may exaggerate their answers.

    Processing errors: These errors usually occur at some stage in theprocess of entering raw data into a computer. Sometimes big errors canoccur simply because a zero has been added or deleted as a number is

  • 1.1. SAMPLING 17

    recorded. These errors are often spotted by asking whether the resultsmake numerical sense.

    Errors due to the method used to collect data once the samplehas been selected: Once the sample has been selected, the data hasto be collected. In the case of surveys (e.g. market research or opinionpolls) a decision must be made whether to contact subjects in thesample by post, telephone, online, or by personal interview. Each ofthese methods can lead to bias in the results.

    Postal surveys are relatively inexpensive but response rates canbe low and, depending on the nature of the survey, there can alsobe voluntary response bias.

    Telephone surveys use computers to randomly dial numbers (soeven unlisted numbers can be reached). They are also relativelyinexpensive. However, there are still households (mostly poorerones) that do not have a telephone, so this leads to some nonran-dom sampling error. It is also important in telephone sampling totry the same number several times and at different times of theday - otherwise the sample will only contain those who are usuallyat home at a certain time of the day.

    Some organisations, such as YouGov, now carry out surveys on-line. Again, not everyone is online so there is potential for somenonrandom sampling error since those who cannot be reached byan online survey may be different in some important way fromthose that can be contacted online.

    Personal (e.g. face-to-face) interviews can result in a higher re-sponse rate but can be expensive to conduct. Also, in some casesface-to-face interviews can lead to response errors. For example,a face-to-face interview about ones health or lifestyle might in-volve some embarrassing questions which some subjects would bereluctant to answer.

    Errors due to the wording of the questions in a survey: Theproblem is that the wording of a question can be slanted to favour oneresponse over another. One way to slant a question is to pose it interms of a desirable outcome. For example, consider the following twoquestions:

  • 18 CHAPTER 1. COLLECTING DATA

    Do you favour a 9pm curfew for children under 14 years of age?

    Do you favour a 9pm curfew for children under 14 years of age in orderto reduce anti-social behaviour on the streets after dark?

    The second question is one example of how to slant a question in orderto try to influence the answer. In this case, the question is phrased totry to encourage people to say Yes.

    Other sampling designs

    In practice, it is often not practical to take a simple random sample from thepopulation of interest. Some of the practical problems include:

    The population may be so large that it is very difficult or too time-consuming to construct a (complete) sampling frame (e.g it would bequite time-consuming to construct a complete sampling frame for thepopulation of all Scottish high school students.)

    The sampling frame may be so large (e.g. the electoral roll for the UK)that it is technically difficult to select a simple random sample from it.

    A simple random sample taken from a very large population (e.g. asimple random sample from all UK adults) is likely to be geographi-cally dispersed. If the sampling data is to be collected by interview,then tracking down all the members of the simple random sample forinterview is both time-consuming and expensive.

    To deal with these and other problems, statisticians and opinion pollstershave developed more sophisticated methods for selecting representativesamples. Some examples of these more elaborate methods are describedbelow.

    Multistage sampling

    Lets consider the problem of interviewing a sample of size 500 from thepopulation of Scottish high school students. A simple random sample of size500 from this population (supposing that we can obtain such a sample) isvery likely to be dispersed throughout Scotland and would be expensive tointerview. Instead, we could use the following approach to select a sample:

  • 1.1. SAMPLING 19

    Select a random sample of size 20 from a list of all Scottish high schools. Get the school roll for each high school in the sample of 20 schools and

    select a random sample of size 25 from each school roll. This gives us(in total) a sample of size 500.

    Discussion

    1. The key feature of this multistage sampling example is that at eachstage we make selections at random.

    2. This procedure does not select a simple random sample since there aresome subsets of 500 students that are impossible to select by using theprocedure described above. For example, this procedure will never se-lect a subset of students who attend 500 different schools. Nevertheless,since at each stage we select schools and students at random, we avoidsome of the problems with bias which arise when we dont make sampleselections at random.

    3. The other advantage of this multistage sampling design is that the inter-viewers only have to visit 20 schools rather than traveling to (possibly)hundreds of different schools across Scotland.

    Stratified sampling

    To construct a stratified sample, we divide the population into distinct groupswhich are called strata. Next, we decide how many units from each stratawill be included in the sample (the number selected from each strata willoften depend on what we want to know about the population). Finally, weselect a (simple) random sample of the designated size from each strata andcombine these samples to form the stratified sample. To illustrate stratifiedsampling, we will consider two examples.

    Example 1.1.8 Suppose that the University library wants to conduct a sur-vey of Heriot-Watt students studying on campus in order to determine stu-dent views on the service provided by the Riccarton library. The populationfor this survey is the 6191 students studying on the Riccarton campus, ofwhich 4699 (75.9%) are undergraduates and 1492 (24.1%) are postgraduates.A stratified sample of size 200 is selected by selecting a simple random sam-ple of size 152 from the undergraduates and a simple random sample of size48 from the postgraduate students.

  • 20 CHAPTER 1. COLLECTING DATA

    Discussion

    1. By selecting a stratified sample, the library can guarantee that 76% ofthe students in the sample are undergraduates and 24% of the sam-ple are postgraduates and this matches the percentages in the wholepopulation of students.

    2. By selecting a simple random sample from each group, we can avoidsampling bias and we can use data from each group to obtain unbiasedestimates for each group separately and for the whole population. Forexample, suppose that 87 of the 152 undergraduates surveyed and 34 ofthe 48 postgraduates surveyed said that they Strongly favour longerlibrary opening hours. From these data we can estimate pUG = 0.572,the proportion of undergraduates who strongly favour longer openinghours, and pPG = 0.708, the proportion of postgraduates that stronglyfavour longer opening hours. To estimate proportion of all students whostrongly favour longer opening hours, we work backwards and estimatehow many students in each group strongly favour longer opening hoursas follows:

    0.572 4699 = 2688 undergraduates0.708 1492 = 1056 postgraduates

    So, we estimate that, in total, 2688+1056= 3744 students out of the6191 students , strongly favour longer opening hours. This gives us anestimated proportion of

    pTotal = 0.605.

    Now lets look at another example.

    Example 1.1.9 Suppose that the Admissions Office at Heriot-Watt wantsto conduct a survey of undergraduate entrants to find out what the studentsthought of the service provided by the Admissions Office. The Office plansto select a sample of 160 entrants to survey. Based on the data which issummarised in the table displayed below, the Admissions Office identifies 3distinct groups of entrants: Home/EU students on the Edinburgh campus,Overseas students on the Edinburgh campus, and students on the BordersCampus.

  • 1.1. SAMPLING 21

    Home/EU OverseasEdinburgh Campus 1270 148 1418Borders Campus 169 1 170

    1439 149 1588

    Heriot-Watt Undergraduate Entrants

    In addition to finding out about general customer satisfaction, the Ad-missions Office would also like to investigate any differences between thesegroups with respect to their satisfaction with the service provided. Now ifthey select a simple random sample of size 160 from the 1588 entrants therewill be only a few Overseas students and Borders students in the sample be-cause there are (relatively) few such students in the population of entrants.To get better (i.e. more precise) information about these two groups it isnecessary to have more of these students in the sample. To accomplish thisthe Admissions Office decides to select a stratified sample, and selects sim-ple random samples of of size 80, 40, and 40 from the Home/EU studentson the Edinburgh campus, the Overseas students on the Edinburgh campus,and the students on the Borders Campus, respectively, in order to obtain asample of size 160 from the new entrants.

    Discussion

    1. In this example, the numbers selected from each group do not corre-spond to the relative sizes of these three groups in the population ofHeriot-Watt undergraduate entrants. This is because the AdmissionsOffice wants to get more precise information about Overseas studentson the Edinburgh campus and students on the Borders Campus, so itselects relatively more students from these two groups. Nevertheless,since we use simple random sampling to select the samples from eachstratum, we can use the data to obtain unbiased estimates for eachstratum. Here is the data:68 out of 80 Home/EU students (Edinburgh campus),27 of the 40 Overseas students (Edinburgh campus),23 out of the 40 Borders campus studentsreported that they were Very satisfied with the service provided bythe Admissions Office.

  • 22 CHAPTER 1. COLLECTING DATA

    The sample proportions for Home/EU, Overseas, and Borders studentsare

    pH =68

    80= 0.85, pO =

    27

    40= 0.675 pB =

    23

    40= 0.575

    We can use these proportions to estimate the numbers of Home/EUstudents (Edinburgh campus) , Overseas students (Edinburgh campus),and Borders campus students that were Very satisfied. These are,respectively:

    pH 1270 = 1079.5, pO 148 = 99.9, pB 170 = 97.75.

    So, we estimate the overall proportion of students who were Very sat-isfied to be

    pT =1079.5 + 99.9 + 97.75

    1588= 0.804.

    2. Taken as a whole, a stratified sample constructed as described abovewould over-represent the opinions of Overseas students on the Edin-burgh campus and students on the Borders Campus and would under-represent the opinions of Home/EU students on the Edinburgh campus.So, for example, we would need to be careful about using the sample toestimate the proportion of all entrants who are Very satisfied with theservice provided by the Admissions Office. In fact, if we just computeda simple proportion of those in the entire sample who were Very sat-isfied we would certainly obtain a biased estimate! Fortunately, muchas we did in Example 1.18, provided that we know the size of each stra-tum and the size of the sample from each stratum, we can use the datafrom this stratified sample to obtain unbiased estimates of populationparameters.

    Systematic random sampling

    As a final example in the section, lets see how to construct a systematicrandom sample of size 25 from the students enrolled in this module:As with selecting a simple random sample, we start with an (ordered) classlist and we assign to each student on the list one (or more) 3-digit number(s).We then use a table of random digits to select the first person in the sample.Once the first person is selected, we then select every fifth person (say) onthe list, starting with the randomly selected first person, until we have asample of size 25.

  • 1.2. EXPERIMENTATION 23

    Discussion

    1. We can select a systematic random sample much more quickly than asimple random sample because we only need to select the first personat random. The rest of the sample is obtained by a systematic selectionfrom the list, starting from a random person on the list.

    2. A systematic random sample is not a simple random sample since notall subsets of size 25 have the same chance of being selected (for exam-ple, systematic random sampling will never select the first 25 peopleon the class list) Nevertheless, since we select the starting point forthe sample at random, every person on the list has the same chance ofbeing selected by a systematic random sample. This means that thereis no favouritism in the selection mechanism (i.e. we do not have sam-pling bias provided there is no underlying bias in the way the namesappear on the class list).

    Note: The key features of the sampling designs described above is that eachis

    based upon a well-defiined procedure for selecting the sample, and uses chance to select units from the population.

    We also note that it is possible to combine these methods to construct evenmore elaborate random sampling designs. All the sampling methods de-scribed above are examples of probability sampling:

    A probability sample is a sample chosen in such a way that we knowwhat samples are possible (not all need be possible) and we knowthe chance each possible sample has to be chosen.

    Key point: Provided we are working with a probability sample, we can stilluse the data obtained from the sample to obtain (unbiased) estimates of thepopulation parameters we are interested in and we can work out the samplingdistribution for our estimates. By looking at the sampling distribution wecan also work out the likely magnitude of our sampling error.

    1.2 Experimentation

    Almost everyone has performed some sort of experiment at some time inorder to answer a question of the form:

  • 24 CHAPTER 1. COLLECTING DATA

    What happens if we ...... (fill in the blank!)?

    We perform experiments because we would like to investigate cause and effect.Establishing a cause and effect relationship is accomplished by deliberatelyimposing a treatment on the objects in the experiment. In contrast, we usesampling to get a representative profile of the population of interest. It isvery important to remember that sampling and other types of observationalstudies are not good ways to establish cause and effect.

    Example 1.2.1 Suppose that I select a random sample of adult men in the50-60 year old age group and conduct a detailed health survey of the membersof the sample. Upon analysing the results of this survey I discover that inthe sample those men who are heavy smokers also tend to have high bloodpressure.

    Question: Can I conclude from the pattern observed in the data that smok-ing causes high blood pressure in men of age 50-60?No! The data show that there is an association between smoking and highblood pressure, but it doesnt prove that smoking causes high blood pressure.The problem is that individuals make their own choices about whether tosmoke or not. It may be that men who choose to smoke differ in someother way from non-smokers and that this other difference between the twogroups may be the cause of the high blood pressure in the men who smoke.Nevertheless, data from observational studies can prompt us to ask questionsregarding cause and effect and can prompt further investigation into possiblecause and effect relationships.

    In order to explore some of the statistical issues associated with investigatingcause and effect we need to introduce some vocabulary:

    Units - These are the basic objects on which the experiment is performed.When the units are people, we call them subjects (or participants).

    Variable - This is a characteristic of a unit which is measured in the exper-iment.

    Response variable - This is a variable whose value we wish to study.

    Explanatory variable - This is a variable that explains or causes a changein the response variable. Explanatory variables are sometimes called factors.

    Treatment - A treatment is any specific experimental condition that isapplied to the units in an experiment. A treatment is often a combinationof specific values (called levels) of each explanatory variable.

  • 1.2. EXPERIMENTATION 25

    Discussion

    1. In the above example the response variable is blood pressure and the ex-planatory variable is whether or not the participant in the study smokedor not. This study is not an experiment because the researcher didnot impose a treatment (i.e. smoking or not smoking) on the partici-pants in the study. So, even though we can identify explanatory andresponse variables, this does not mean that the study is an experiment.

    2. We justify the use of experimentation to establish causation as follows:Suppose that we change the level of one or more explanatory vari-ables (and all other experimental conditions remain the same), thenany resulting change in the response variable must be the caused bythe changes in the levels of the explanatory variables. For example,we could investigate the effect of water temperature on colour fastnessof a dye by washing dyed fabric at different temperatures. Providedwe could keep all the conditions in the experiment the same (exceptfor water temperature), any changes in the colour of the fabric afterwashing could be attributed to the effect of the water temperature onthe dyed fabric. Unfortunately, in many situations it can be difficult toguarantee that nothing affects the response variable except the changesin the explanatory variables!

    The need for an experimental design

    In our discussion of sampling we looked at how to sample in order to obtaina representative sample of the population and to minimise the errors in ourresults. Likewise, in experimentation we have to be concerned with how anexperiment is designed.

    The most basic type of experiment follows one of these patterns:

    Treatment Observation (1)

    orObservation Treatment Observation (2)

    In experiment (1), a treatment is applied and its effect is observed. In ex-periment (2), before-and-after measurements are taken. Now under idealconditions (e.g. an experiment in a carefully set up laboratory), experiments

  • 26 CHAPTER 1. COLLECTING DATA

    following one of the designs above can give us good results. Unfortunately, itis not always possible to design an ideal experiment and just as we need tosample with care, we also need to do experiments carefully in order to drawconclusions from them.

    With sampling we needed to look out for sampling procedures which couldlead to sampling bias. With experimentation (and observational studies)the problem is usually invalid data from which we are unable to draw anyconclusions, i.e. we cannot determine if the treatment had an effect on theresponse variable. Here are some typical situations which result in invaliddata:

    Confounding of variablesSometimes we cannot determine the effect of an explanatory variable on theresponse variable because the response variable may be influenced by othervariables which are not part of the treatment used in the experiment. Anyvariable which is not an explanatory variable in the experiment but whichmay influence the response variable is called an extraneous variable.

    Example 1.2.2 An educational researcher who has developed a new methodto teach reading to primary school children decides to test its effectivenessby asking several head teachers in Edinburgh to introduce the scheme intheir schools. At the end of the academic year the children who have beentaught using the new scheme are tested and the results are compared with thereading test results from schools that did not participate in the scheme. Theresults for the pupils that were taught under the new method were higher,on average, than those of the children from non-participating schools.

    Question: Do the results from the above experiment show that the newscheme improves reading attainment?No! The problem is that there may be other factors which have also influ-enced the performance of the children in the participating schools. For exam-ple, the researcher may have favoured contacting head teachers of schools inmore prosperous areas of Edinburgh where average performance on readingtests is already higher than the city average before the experiment. Also,head teachers did not have to participate in the experiment. So perhapsthe ones that chose to let their schools participate in the new scheme werealready more motivated to improve reading attainment in their schools thanthe ones who didnt participate. The motivation and enthusiasm of the par-ticipating head teachers may have helped to improve the reading attainment

  • 1.2. EXPERIMENTATION 27

    of the pupils at least as much as the new scheme! So we cannot draw anyconclusions from this experiment because factors other than the new readingscheme may have influenced the results of the experiment. This is an exampleof confounding:

    The effects of two or more variables on a response variable are saidto be confounded if these effects cannot be distinguished from oneanother.

    In the example above, the motivation of the participating teachers (an ex-traneous variable) and the method of teaching (the explanatory variable) areconfounded.

    Data from observational studies

    As already mentioned above, it is usually difficult to determine cause andeffect based on data from observational studies. In particular, we often haveproblems with confounding of variables in observational studies. Heres an-other example of an (comparative) observational study:

    Example 1.2.3 A large study used health service records to investigate theeffectiveness of two ways to treat prostate cancer. One treatment was tradi-tional surgery and the other was based on chemotherapy. In each case, thepatients consultant determined which treatment would be given. The studyfound that the patients who received chemotherapy were less likely to survivefor more than 5 years.

    DiscussionIn this example the response variable is post-treatment survival and the ex-planatory variable is the type of cancer treatment (i.e. surgery or chemother-apy). This study is not an experiment because the researcher did not imposethe treatment on the patients (each patients consultant determined the treat-ment).Question: Do the results from the above comparative study show thatchemotherapy is less effective as a treatment for prostrate cancer?No! There are other variables that may be confounding the explanatory vari-able (which is cancer treatment). In particular, the choice of treatment foreach patient was determined by the patients doctor (not by the researcherwho was doing the study), and the doctor is likely to consider a variety offactors when deciding which treatment is appropriate. For example, somepatients may have been in such poor health or have other complicating health

  • 28 CHAPTER 1. COLLECTING DATA

    problems that surgery would have been too dangerous for these patients. Inthese cases, the doctor would be more likely to recommend chemotherapyinstead of surgery. If unwell patients tend to be recommended more oftenthan healthier patients for chemotherapy, this could also explain why the pa-tients who received chemotherapy tended to have a lower survival rate. So,in this example, the patients general health before treatment (an extraneousvariable) and the cancer treatment received (the explanatory variable) areconfounded.

    Placebo effect

    The response by patients to any treatment in which they have confidence iscalled the placebo effect. There are many surprising examples of the powerof the placebo effect in the medical literature. Here are a few:

    Many studies have shown that placebos relieve pain in 30-40% of pa-tients, even those recovering from major surgery.

    One study found that when a group of balding men was given a placebo,42% of the men either experienced increased hair growth or did notcontinue to lose hair.

    In an experiment to investigate the effectiveness of vitamin C in pre-venting colds it was found that those who thought that they were beinggiven vitamin C (but in fact received a placebo) had fewer colds thanthose who thought that they were being given a placebo (even thoughthey were, in fact, receiving vitamin C)!

    Remark: Because of the placebo effect, clinical trials of drugs and othermedical treatments have to be carefully designed. For example, suppose thatI wish to determine whether a certain medication was effective in reducingblood pressure. A naive design for an experiment to investigate this questionmight be to measure the blood pressure of 40 patients, give each patientthe medication, and then measure their blood pressure after a week on themedication. The problem with this approach is that any improvement ina patients blood pressure might be due to the fact that they expect thetreatment to work (i.e. might be due to the placebo effect) rather than dueto the action of the medication. In other words the placebo effect and theeffect of the medication are confounded.

    Experimental design

  • 1.2. EXPERIMENTATION 29

    Weve seen above, that data from experiments and observational studies canbe invalid due to the confounding of variables. In order to avoid generatinginvalid data, we need to develop statistically sound methods for conductingexperiments. In other words, we need a good experimental design (i.e. a goodplan for the experiment). The key idea behind most good experimental de-signs is comparative experimentation. The basic features of comparativeexperimentation are:

    1. Start with two equivalent groups.

    2. Give the treatment to one of the groups (this group is called the exper-imental group). The other group (which is called the control group) istreated in exactly the same way except that this group does not receivethe treatment.

    Key point: Extraneous variables influence both groups, whereas the treat-ment only affects one group.Warning: Although this experimental design addresses the problem of con-founding variables, there is still some room for improving this design! Inparticular, comparison will eliminate the problem of confounding only if wehave equivalent groups of subjects. The weakness in the design describedabove is that it relies on dividing the units into two equivalent groups. Buthow can we be sure that the groups are equivalent? How can we makesure that one of the groups isnt different from the other in some way thatleads to bias in the experimental results? In particular, how can we avoidsome hidden bias arising due to the way that units or subjects are assignedto groups?

    Just as in sampling, we were able to eliminate (some) sources of bias by se-lecting a random sample, in experimentation we can use random allocationto improve our experimental design and reduce any bias in the results.

    Implementation of random allocation

    The implementation of random allocation of units to experimental groups issimilar to the method of selecting a simple random sample.

    Example 1.2.4 Suppose that I want to test a new organic fertiliser ontomato plants. I start with 40 plants and assign a number to each plant.Then using a table of random digits, I select 20 of the 40 plants to be fer-tilised. The other 20 plants receive no fertiliser, but in every other way are

  • 30 CHAPTER 1. COLLECTING DATA

    treated exactly the same as the 20 plants in the treatment group. At the endof the growing season I record the yield of each plant. This is an example ofa randomised comparative experiment because the units are randomlyassigned to groups.

    Discussion: Because we have allocated the tomato plants randomly to thetwo groups, there has been no favouritism in the allocation of the plants -i.e. the composition of each group is roughly the same with respect to theother extraneous variables such as the health and vigour of the plants. Inother words, neither group is more likely to consist of the healthiest plants(or the weaker plants). This random allocation of plants to groups averagesout the effect of extraneous variables and ensures that the groups are roughlyequivalent.

    The importance of randomised comparative experiments stems from the factthat we can use the following argument to establish cause-and-effect basedon the results of a randomised comparative experiment:

    Randomization produces groups of subjects that should be similar inall respects before we apply the treatments

    Comparative design ensures that extraneous variables other than theexperimental treatments operate equally on all groups.

    Therefore, (significant) differences in the response variable betweengroups must be due to the differences (and the effects) of the treat-ments.

    Further discussion: The more subjects used in a randomised comparativeexperiment, the more likely it will be that the treatment groups in the exper-iment will be roughly equivalent. For example, in the tomato experimentdescribed above, there is still a chance that when I randomly allocate plantsto groups I will (by chance) allocate many more healthy plants to the groupthat gets the fertiliser than to the other group (this would be unlucky butstill possible). To reduce the chance that, in spite of random allocation, I endup with unbalanced groups, I should make the treatment groups as large aspossible, since this would reduce the chance that one group has many morehealthy plants than the other.

  • 1.2. EXPERIMENTATION 31

    Summary - Principles of experimental design

    1. Control the effects of extraneous variables on the response variable bycomparing two or more treatments.

    2. Randomly allocate subjects to treatments.

    3. Use enough subjects in each group in order to reduce the chancevariation in the results.

    Important: In medical experiments it is also necessary to make sure thatthe control group gets a placebo!

    Example 1.2.5 Suppose that I want to investigate whether Echinacea (aplant extract) boosts immunity against colds. I start as in the fertiliserexperiment, and randomly allocate subjects to groups. Now suppose thatone group takes Echinacea every day over the winter months and the othergroup takes nothing, and suppose that I interview the participants on aregular basis to find out whether they have contracted colds over the winter.

    Discussion: The problem with this experimental design is that I cannot besure at the end of the experiment whether any difference between the inci-dence of colds is due to the effect of the Echinacea or due to the placebo effect.The problem is that one group knows that it is taking medication to preventcolds whereas the other group knows that it is not receiving treatment. Soa difference between the groups may be due to the placebo effect, and in anycase the explanatory variable (Echinacea treatment) and the placebo effectare confounded.

    Question: How can we improve the design of this experiment to eliminatethe confounding of the treatment and the placebo effect?

    Answer: The solution is to conduct a randomised, double-blind exper-iment with a placebo control. In a double-blind experiment neitherthe subjects nor the people who work with them know which treatment eachsubject is receiving. So, to improve the Echinacea experiment, in additionto randomising the allocation of subjects to groups, one group should re-ceive the Echinacea and the other group should receive a placebo, and theparticipants shouldnt know whether they are receiving the Echinacea or theplacebo. In addition, anyone else involved in the experiment (e.g. anyone

  • 32 CHAPTER 1. COLLECTING DATA

    who is involved in recording data from the subjects) shouldnt know who isreceiving the Echinacea and who is getting the placebo. Once the experimenthas ended (e.g. all the data has been collected), the blinds can be removedso that a statistician can analyse the results.

    Note: It is generally accepted that whenever possible, it is desirable thatmedical experiments with human subjects are randomized, double-blind witha placebo control.

    Interpretation of results

    How can we know that the differences observed between the treatment groupand the control group are significant - i.e. the differences are due to some-thing other than just chance?

    Example 1.2.3 again Lets consider the tomato fertiliser experiment again.Suppose at the end of the growing season I harvest the tomatoes from eachplant and weigh them. I then use my data to compute the average yield foreach group and I obtain:

    Average yield for fertilised plants: 3.95 kgs

    Average yield for unfertilised plants: 3.58 kgs

    Question: Since the average yield for the fertilised plants is greater thanthe average yield for the unfertilised plants, do these results prove that theorganic fertiliser increases yield?

    Discussion: We need to be careful about coming to hasty conclusions! Theproblem is that even if both groups received no fertiliser there would stillbe a difference between the yield of the first group of plants and the secondgroup of plants. This is because there will always be some chance differencesbetween the plants in the groups and this will result in chance differencesbetween their yields. In order to decide whether these results prove thatthe fertiliser increases yield, a statistician must first work out how muchchance variation in the yields we would expect to see between the groupsif neither group is fertilised. Next, the statistician looks at the observeddifference between the unfertilised and fertilised groups. Now if this observeddifference is much greater than the difference that we would expect to seebetween two untreated groups, we can conclude that the difference in theyields is unlikely to be due to just chance variation and we conclude thatthe best interpretation of the data is that the fertiliser increases yield. The

  • 1.2. EXPERIMENTATION 33

    statistician would report that the difference between the yields is statisticallysignificant.

    An observed effect or difference of a size that would rarely occur bychance is called a statistically significant effect or difference.

    In practice, whether an observed effect or difference is statistically significantwill depend on both the magnitude of the observed effect or difference andon the number of subjects in the experiment. You will learn more aboutexactly how statisticians determine whether an observed effect or differenceis statistically significant in subsequent statistics modules. For now the mainpoint is that if you read that a result of an investigation is statisticallysignificant, you can conclude that the investigators found good statisticalevidence to support the claim that differences in the levels of the responsevariable(s) are due to differences in the treatments imposed.

    Difficulties and issues in experimentation

    In the section on sampling we saw that even when we use an unbiased sam-pling method, there can still be problems with sampling that cannot beavoided by using a good sampling method (e.g. non-sampling errors suchas nonresponse errors, leading questions, processing errors, etc.). Likewise,randomised comparative experiments go a long way towards avoiding theproblem of invalid data, but we still need to be on the lookout for difficultiesin experimentation and in the interpretation of experimental results. Hereare (just a few) problems and issues that we need to watch out for:

    Applicability of the results (can the results be extended?)

    A common problem with the interpretation of experimental results is thatthe applicability of the results can be over-stated. We always need to lookcarefully at how an experiment was conducted in order to determine to whatpopulation the conclusions apply. In many experiments the researcher hasto select subjects from an available pool of subjects which may not be rep-resentative of the population to which the researcher would like to apply theresults. In this case the results will probably be valid for the subject poolbut the researcher must justify why the results can be applied to the largerpopulation.

    Example 1.2.6 Various well-designed clinical trials have shown that usingdrugs to reduce blood cholesterol in middle-aged men with high cholesterol

  • 34 CHAPTER 1. COLLECTING DATA

    also decreases their risk of a heart attack. Can we conclude from this trialthat, in general, reducing blood cholesterol decreases the risk of heart disease?

    Discussion: The problem with drawing general conclusions about bloodcholesterol and the risk of heart attack from the results of these experimentsis that there may be important physiological differences between men andwomen (or between men of different ages) which mean that blood cholesterollevel is not as important a risk factor for these other groups as it is for middle-aged men with high cholesterol. Doctors, for example, need to be careful notto assume that the results of clinical trials are applicable to types of patientsthat were not part of the relevant trials.

    Lack of realism in the experiment

    Another (related) problem with the interpretation of experimental results isthat the experimental treatments (or some other feature of the experiment)may be unrealistic.

    Example 1.2.7 In order to determine whether a food additive is safe, it isstandard practice to test high doses of the additive on laboratory rats. Theadditive is deemed unsafe if the experimental group develops significantlymore tumours than the control group.

    Discussion: The decision to ban a food additive based on an animal ex-periment is an example of erring on the side on caution. It is importantto remember that such an experiment does not necessarily prove that theadditive is actually dangerous for human consumption. The problem is thatthe experiment is not realistic: humans are not rats and typical doses of theadditive are usually much smaller than the doses given to the rats.

    Psychologists and other social scientists often have to devise ingenious experi-ments to investigate psychological responses to various factors. The difficultywith some of these experiments is that they are (necessarily) somewhat ar-tificial - e.g. they are conducted in a laboratory, the subjects are aware thatthey are participating in a psychological experiment, etc. As a result, oneneeds to be careful about generalising the findings of such experiments toreal-world situations.

    Dropouts, nonadherers, and refusals in experiments with humansubjects

    Experiments with human subjects can be compromised by human behaviour!When this happens, statisticians and researchers have to figure out how to

  • 1.2. EXPERIMENTATION 35

    make appropriate adjustments in order to try to reduce any bias that mayresult from human behaviour. Some typical problems include:

    Dropouts: Experiments that continue over a long period of time oftenhave subjects that dropout before the end of the experiment. It is veryimportant that researchers try to determine the reasons that partici-pants drop out. In particular, the researchers should try to determinewhether the reason for dropping out is related to a feature of the ex-periment. For example, perhaps the subjects receiving one particulartreatment experienced unpleasant side effects and as a result decide tostop participating. Clearly, their reason for dropping out is very rel-evant to the experiment and as a result the results of the experimentmay be biased because the dropouts did not complete the experiment.

    Nonadherers: A subject who participates in an experiment but whodoesnt follow the experimental treatment is called a nonadherer. Thereare many reasons why a subject may break the rules. For example,an experiment might require participants to take a medication accord-ing to a very careful schedule over several weeks. The difficulty withsuch an experiment is that people sometimes arent very good aboutremembering to take medication. If subjects are not taking the med-ication according to the experimental guidelines it will be difficult todetermine what the true effect of the medication is!

    Refusals: Human subjects have to agree to participate in experimentsand that means that individuals can refuse to participate! Now if thereis no particular reason or pattern to the refusals, then non-participationof some of the selected subjects may not make any difference to thevalidity of the experimental results. On the other hand, if those whorefuse to participate differ in some systematic way from those whoparticipate then bias can result.

    More complicated experimental designs

    There are a variety of ways that randomised comparative experiments canbe developed to make more complicated comparisons. We describe a somecommon variants below.

    Completely randomized design with multiple factors/levels

  • 36 CHAPTER 1. COLLECTING DATA

    Randomised comparative experiments can be used to investigate the com-bined effect of more than one variable on the response variable. Variablescan also be set at different levels in order to investigate the effect of thelevel of a variable (e.g. dose of a certain drug) on the response variable. Hereis an example of an experiment with two explanatory variables that are setat various levels:

    Example 1.2.8 Clothing manufacturers usually recommend both the tem-perature (i.e. 30o, 40o, etc) and the cycle setting ( Cotton wash, Syntheticwash, etc.) at which a garment should be washed. To determine the opti-mal temperature and cycle setting for a particular material, we can performa randomised comparative experiment with multiple factors and levels. Inthis case the factors (i.e. explanatory variables) are temperature and cyclesetting and the levels are the various settings for temperature and cycle.All possible combinations of temperature and cycle settings give us a total of20 different treatments as shown in the diagram below (labelled by Romannumerals):

    30o 40o 50o 60o 90o

    Cotton I II III IV VSynthetic VI VII VIII IX XDelicate XI XII XIII XIV XVWool XVI XVII XVIII IX XX

    To carry out the experiment, the researcher obtains 200 pieces of the samefabric which have all been stained with the same substance, and randomlyallocates 10 pieces of fabric to each of the 20 treatments. At the end of theexperiment the washed pieces of fabric are examined to see how well theyhave been cleaned, whether the dye in the fabric has run, etc.

    Remark: This experiment allows the manufacturer to discover how theinteraction between between two explanatory variables (temperature andcycle setting) effect the response variable(s).

    Randomized Block Designs

    Matching subjects in various ways can be used in conjunction with ran-domization to produce more precise results than would be obtained by asimple randomised comparative experiment. This is a particularly useful inthe design of experiments where it is thought that extraneous variables (i.e.

  • 1.2. EXPERIMENTATION 37

    variables that are not part of the treatment) may have a big impact on theresponse variable. In order to control the effects of these extraneous variablesin the experiment a block design can be used:

    A block is a group of experimental units or subjects that are similarwith respect to some extraneous variables that are thought to affectthe response to the treatment in the experiment.

    In a randomised block design experiment, the subjects are first groupedinto blocks and then, within the blocks, the subjects are randomly assignedto treatments.

    Note: In a randomised block design, the allocation of subjects to blocks isnot random! The subjects or units are grouped together according to somecharacteristics that they have in common. After the subjects have been putin blocks, the subjects within a block are randomly allocated to treatments.Heres a simple example of a randomised block design:

    Example 1.2.9 A pharmaceutical company wishes to compare the effective-ness of a new drug for reducing levels of LDL cholesterol to the effectivenessof two commonly used treatments for high levels of LDL cholesterol. It isthought that the effectiveness of any drug for reducing LDL levels is affectedboth by the gender of the patient and by the initial level of LDL in the blood-stream. A total of 600 men and 400 women have agreed to participate inthe clinical trial of these treatments. In order to control for these extraneousvariables, the men are divided into blocks of men with similar levels of LDLcholesterol and the women are divided into blocks of women with similarlevels of cholesterol. Within each block the subjects are randomly allocatedto treatments. Also, because this is an experiment with human subjects, theexperiment is double-blind - i.e. the subjects do not know which treatmentthey are receiving and the staff running the experiment do not know whichtreatment a patient has received.

    Discussion: Blocking in the experiment above allows a researcher to get aclearer picture of the differences between the treatments. This is because theblocks have been chosen to equalise important (and unavoidable) sourcesof variation between the subjects. Less important sources of variation arethen averaged out by randomly allocating treatments within the blocks. Inaddition, by grouping similar subjects together before randomly allocatingtreatments, the researcher can also separately investigate the responses of

  • 38 CHAPTER 1. COLLECTING DATA

    subjects in each block to the different treatments.Question: What would have happened if the researcher had not chosen touse a block design for this experiment?Answer: If the researcher used a simpler design, e.g. a double-blind exper-iment where the subjects are randomly assigned to treatment groups withno regard to gender or intial levels of LDL, the data from the experimentcould still be used to investigate differences between the responses to thetreatments but it would be harder to make precise statements about themagnitude of these differences because each treatment group contains a mix-ture of (disimilar) subjects.

    A matched pairs design is a special case of a block design. In a matchedpair design only two treatments are compared. The blocks consist of pairsof subjects that are matched as closely as possible and the treatments arerandomly allocated. Sometimes a pair can consist of a single subject whogets both treatments. In such experiments, the order that the treatmentsare given is randomised (since the order of the treatments may influence thesubjects response). Here is an example of such an experiment.

    Example 1.2.10 A large food manufacturer has developed a new recipe forone of the pasta sauces that it manufacturers. In order to test whether thisnew product will be more popular with consumers than the old product, thecompany asks 50 of its employees to participate in a matched pairs exper-iment. In the experiment the employee sits in a cubicle which has a smallsliding door that connects to the companys product development kitchen.The employee is given a small dish of pasta with sauce to taste and a question-naire to fill out. After the employee has tasted the first dish and completedthe form, the dish and form are removed and the employee is given a seconddish of pasta to taste and a second questionnaire to fill out. At the endof the tasting session, the employee is asked which sauce he prefered. Theorder in which the employees are given the two different sauces is random(i.e. essentially by flipping a coin).

    Discussion: In the experiment described above each employee is a (per-fectly!) matched pair since all extraneous variables (such as personal foodpreferences, etc) are the same for each treatment. The order in which thesubjects taste the food is randomised to average out any effect that ordermight have on the response variables.

  • 1.2. EXPERIMENTATION 39

    Some comments on observational studies

    We have seen in this section that properly designed randomised comparativeexperiments give us a powerful tool for trying to answer questions aboutcause and effect. Unfortunately, there are situations where is it either notpossible (or ethical!) to set up a randomised comparative experiment toinvestigate questions of cause and effect.

    Example 1.2.11 It is not possible to design a randomised comparative ex-periment to establish that smoking increases the risk of heart attack. Theproblem is that if we suspect that smoking is harmful to human health, itwould be unethical to investigate the effects of smoking by randomly dividinga group of healthy subjects into two treatment groups and then require onegroup to take up smoking in order to discover if the smokers are more likelyto have heart attack than non-smokers!

    Example 1.2.12 Much research is done to investigate various differencesbetween men and women. In such studies, gender is the explanatory variable.In a randomized comparative experiment we would like to randomly allocateparticipants to treatments (i.e. levels of the the explanatory variable) but wecannot randomly assign people to be either male or female! So, in this case,we cannot design a truly randomized comparative experiment to investigatedifferences between men and women.

    Discussion: In the examples above (and in many other situations where itis not practical to perform a randomised comparative experiment) we have torely on data from observational studies and this makes establishing causationmuch more difficult. Nevertheless, we can use some of the principles of goodexperimental design to improve an observational study. In particular, we canuse comparison and matching. For example, to study the whether smokingincreases the risk of heart attack we could start with a randomly selectedgroup of smokers. We would then select from a large group of non-smokersindividuals who match the individuals in the smoking group with respectto any extraneous variable that we think might also be a risk factor for heartattacks (e.g. age, gender, weight, blood cholesterol level, etc). Of course,there may by other extraneous variables that affect the risk of a heart attackof which we are unaware and for which we have not matched the groups!In addition, a statistician may also try to make statistical adjustments forconfounding variables (such as weight, age, etc) in order to make a fairer

  • 40 CHAPTER 1. COLLECTING DATA

    comparison between smokers and non-smokers. Of course, statisticians needto be careful about making such adjustments as this can introduce otherproblems of bias.

    Moral: We need to be careful whenever we encounter the results of an obser-vational study. The best observational studies will be based on comparisonof matched groups and may have statistical adjustment for confounding vari-ables. Nevertheless, adjustments can sometimes lead to bias and matchingwont be able to control for unknown confounding variables. So, we need tobe wary of any claims that an observational study has proved a cause-and-effect relationship!

    1.3 Measurement

    In both experimentation and sampling we are interested in studying someproperty of the units in the study - e.g. political opinions, physical stamina,reading ability, intelligence, blood pressure, etc. In order to make concretestatements about the property, we (usually) attempt to find a way to mea-sure the property numerically:

    In statistical science, to measure a property means to assign numbersto units as a way to represent the property.

    Note: Deciding how to measure a property is an important part of anystatistical study. In other words, after we have decided how to sample (i.e.how to select the sample and how to contact the sample) or how to conductthe experiment (i.e. the experimental design, etc), the next problem is todecide how to measure the property of interest.

    To take a measurement we must have:

    An object to be measured. A (well-defined) property of the object to be measured. A measuring instrument that actually assigns a number to represent

    the property.

    Example 1.3.1 Suppose that a researcher wants to investigate whether acertain treatment for asthma is effective. The property to be measured inthe experiment is lung function before and after the treatment. To measure

  • 1.3. MEASUREMENT 41

    this property, the participant exhales into a peak flow meter and the level ofpeak flow is recorded.Note: Once the researcher has decided how to measure lung function, thevariable is defined in terms of the method of measurement. In this case, vari-able is the peak flow because that is what the researcher actually measures.

    Now deciding how to measure a property is easiest when everyone clearlyunderstands the property that we propose to measure (e.g. height, weight,etc.) The problems arise when the definition of the property to be measuredis imprecise or disputed.

    Example 1.3.2 Suppose that a psychologist wants to measure intelligence.In this case, there is an immediate problem because human intelligence isa complex property and there is no universally accepted definition of it.Without a clear understanding of intelligence, it is difficult for researchers toagree how to measure it. For example, there is much debate about whetherthe standard IQ test is an appropriate measure of a property that is ascomplex as intelligence!

    Heres another example that illustrates some of the issues that arise when wetry to measure properties in complicated situations.

    Example 1.3.3 Suppose we wish to measure an individuals employmentstatus. Before we can measure this property, we need to define what wemean when we say that someone is employed or unemployed or economicallyinactive.Note: Different organisations may have different ideas about what it meansto say that someone is employed! In the UK, the Office of National Statisticshas adopted the following definitions:

    1. A person (aged 16 or over) is employed if in the previous week theydid at least one hour of paid work, or are temporarily away from a job(e.g. on holiday), or are on a government training scheme, or have doneunpaid work for a family business.

    2. A person (aged 16 or over) is


Recommended