MAT 1272 STATISTICSLESSON 1
1.1 STATISTICS AND TYPES OF STATISTICS
WHAT IS STATISTICS?
STATISTICS
STATISTICS IS THE SCIENCE OF
COLLECTING, ANALYZING,
PRESENTING, AND INTERPRETING
DATA, AS WELL AS OF MAKING
DECISIONS BASED ON SUCH
ANALYSES.
READ THE GIVEN INFORMATION BELOW AND THINK OF ONE SIMILAR EXAMPLE OR SEARCH FOR ONE DATA IN THE INTERNET. STATE THE SOURCE AND
THE TIME POSTED.
•The following examples present some statistics:1.During March 2014, a total of 664,000,000 hours were spent by Americans watching March Madness live on TV and/or streaming (Fortune Magazine, March 15, 2015).•2.Approximately 30% of Google's employees were female in July 2014 (USA TODAY, July 24, 2014).•3.According to an estimate, an average family of four living in the United States needs $130,357 a year to to live the American dream (USA TODAY, July 7, 2014).•4.Chicago's O'Hare Airport was the busiest airport in 2014, with a total of 881,933 flight arrivals and departures.•5.In 2013, author James Patterson earned $90 million from the sale of his books (Forbes, September 29, 2014).•6.About 22.8% of U.S. adults do not have a religious affiliation (Time, May 25, 2015).•7.Yahoo CEO Marissa Mayer was the highest paid female CEO in America in 2014, with a total compensation of $42.1 million.
STATISTICAL METHODS HELP US MAKE SCIENTIFIC AND INTELLIGENT DECISIONS
• DECISIONS MADE BY USING STATISTICAL
METHODS ARE CALLED EDUCATED
GUESSES
• DECISIONS MADE WITHOUT USING
STATISTICAL (OR SCIENTIFIC) METHODS
ARE PURE GUESSES AND, HENCE, MAY
PROVE TO BE UNRELIABLE.
• FOR EXAMPLE, OPENING A LARGE STORE IN
AN AREA WITH OR WITHOUT ASSESSING
THE NEED FOR IT MAY AFFECT ITS SUCCESS.
• LIKE ALMOST ALL FIELDS OF STUDY,
STATISTICS HAS TWO ASPECTS:
• THEORETICAL OR MATHEMATICAL
STATISTICS DEALS WITH THE
DEVELOPMENT, DERIVATION, AND PROOF
OF STATISTICAL THEOREMS, FORMULAS,
RULES, AND LAWS.
• APPLIED STATISTICS INVOLVES THE
APPLICATIONS OF THOSE THEOREMS,
FORMULAS, RULES, AND LAWS TO SOLVE
REAL-WORLD PROBLEMS.
• THIS COURSE IS CONCERNED WITH
APPLIED STATISTICS AND NOT WITH
THEORETICAL STATISTICS. (EDUCATED
GUESS, DECISION MAKING DESIGNED FOR
SUCCESS).
The accompanying chart shows the lobbying spending by five selected companies during 2014. Many companies spend millions of dollars to win favors in Washington. According to Fortune Magazine of June 1, 2015, “Comcast has remained one of the biggest corporate lobbyists in the country.” In 2014, Comcast spent $17 million, Google spent $16.8 million, AT&T spent $14.2 million, Verizon spent $13.3 million, and Time Warner Cable spent $7.8 million on lobbying. These numbers simply describe the total amounts spent by these companies on lobbying. We are not drawing any inferences, decisions, or predictions from these data. Hence, this data set and its presentation is an example of descriptive statistics.
Case study 1-1
TYPES OF STATISTICS(DESCRIPTIVE & INFERENTIAL STATISTICS)
DESCRIPTIVE STATISTICS CASE STUDY 1-1
DESCRIPTIVE STATISTICS CHARACTERISTICS
• DESCRIPTIVE STATISTICS CONSISTS OF METHODS FOR ORGANIZING, DISPLAYING, AND DESCRIBING DATA BY USING TABLES, GRAPHS, AND SUMMARY MEASURES.
• CHAPTERS 2 AND 3 DISCUSS DESCRIPTIVE STATISTICAL METHODS. IN CHAPTER 2, WE LEARN HOW TO CONSTRUCT TABLES AND HOW TO GRAPH DATA. IN CHAPTER 3, WE LEARN HOW TO CALCULATE NUMERICAL SUMMARY MEASURES, SUCH AS AVERAGES.
• CASE STUDY 1-1 PRESENTS AN EXAMPLE OF DESCRIPTIVE STATISTICS.
A poll of 176,903 American adults, aged 18 and older, was conducted January 2 to December 30, 2014, as part of the Gallup-Healthways Well-Being Index survey. Gallup and Healthways have been “tracking Americans' life evaluations daily” since 2008. According to this poll, in 2014, Americans' outlook on life was the best in seven years, as 54.1% “rated their lives highly enough to be considered thriving,” 42.1% said they were struggling, and 3.8% mentioned that they were suffering. As mentioned in the chart, the margin of sampling error was ± 1%. In Chapter 8, we will discuss the concept of margin of error, which can be combined with these percentages when making inferences. As we notice, the results described in the chart are obtained from a poll of 176,903 adults. We will learn in later chapters how to apply these results to the entire population of adults. Such decision making about the population based on sample results is called inferential statistics.
Case study 2
INFERENTIAL STATISTICS
INFERENTIAL STATISTICS
CASE STUDY 1-2
INFERENTIAL STATISTICS
CHARACTERISTICS
• INFERENTIAL STATISTICS CONSISTS OF METHODS THAT USE SAMPLE RESULTS TO HELP MAKE DECISIONS OR PREDICTIONS ABOUT A POPULATION.
• CASE STUDY 1-2 PRESENTS AN EXAMPLE OF INFERENTIAL STATISTICS. IT SHOWS THE RESULTS OF A SURVEY IN WHICH AMERICAN ADULTS WERE ASKED ABOUT THEIR OPINIONS ABOUT THEIR LIVES.
• CHAPTERS 8 THROUGH 15 AND PARTS OF CHAPTER 7 DEAL WITH INFERENTIAL STATISTICS.
PROBABILITY, WHICH GIVES A MEASUREMENT OF THE LIKELIHOOD THAT A CERTAIN OUTCOME WILL OCCUR, ACTS AS A LINK BETWEEN DESCRIPTIVE
AND INFERENTIAL STATISTICS. PROBABILITY IS USED TO MAKE STATEMENTS ABOUT THE OCCURRENCE
OR NONOCCURRENCE OF AN EVENT UNDER UNCERTAIN CONDITIONS. PROBABILITY AND
PROBABILITY DISTRIBUTIONS ARE DISCUSSED IN CHAPTERS 4 THROUGH 6 AND PARTS OF CHAPTER 7.
1.2 BASIC TERMSTHIS SECTION EXPLAINS THE MEANING OF AN
ELEMENT (OR MEMBER), A VARIABLE, AN OBSERVATION, AND A DATA SET.
WORLD'S EIGHT RICHEST PERSONS AS OF MARCH 2015 DESCRIPTION
• EACH PERSON LISTED IN THIS TABLE IS CALLED AN ELEMENT OR A MEMBER OF THIS GROUP (8 ELEMENTS)
• NOTE THAT ELEMENTS ARE ALSO CALLED OBSERVATIONAL UNITS.
• AN ELEMENT OR MEMBER OF A SAMPLE OR POPULATION IS A SPECIFIC SUBJECT OR OBJECT (FOR EXAMPLE, A PERSON, FIRM, ITEM, STATE, OR COUNTRY) ABOUT WHICH THE INFORMATION IS COLLECTED.
• A VARIABLE IS A CHARACTERISTIC UNDER STUDY THAT ASSUMES DIFFERENT VALUES FOR DIFFERENT ELEMENTS. IN CONTRAST TO A VARIABLE, THE VALUE OF A CONSTANT IS FIXED.
• THE VALUE OF A VARIABLE FOR AN ELEMENT IS CALLED AN OBSERVATION OR MEASUREMENT.
• A DATA SET IS A COLLECTION OF OBSERVATIONS ON ONE OR MORE VARIABLES.
APPLICATIONS
8/29/2017Sample Footer Text 13
8/29/2017Sample Footer Text 14
QUANTITATIVE VARIABLES
A VARIABLE THAT CAN BE
MEASURED NUMERICALLY IS
CALLED A QUANTITATIVE
VARIABLE. THE DATA
COLLECTED ON A
QUANTITATIVE VARIABLE
ARE CALLED QUANTITATIVE
DATA.
DISCRETE VARIABLES
DISCRETE VARIABLE
A VARIABLE WHOSE
VALUES ARE COUNTABLE
IS CALLED A DISCRETE
VARIABLE. IN OTHER
WORDS, A DISCRETE
VARIABLE CAN ASSUME
ONLY CERTAIN VALUES
WITH NO INTERMEDIATE
VALUES.
CONTINUOUS VARIABLES
A VARIABLE THAT CAN
ASSUME ANY NUMERICAL
VALUE OVER A CERTAIN
INTERVAL OR INTERVALS
IS CALLED
A CONTINUOUS
VARIABLE.
1.3 Types of Variables 1.3.1 Quantitative Variables 1.3.2 Qualitative or Categorical Variables
1.3.2 QUALITATIVE OR CATEGORICAL VARIABLES
A variable that cannot assume a numerical value but can be classified into two or more nonnumeric categories is called a qualitative or categorical variable. The data collected on such a variable are called qualitative data.
For example, the status of an undergraduate college student is a qualitative variable because a student can fall into any one of four categories: freshman, sophomore, junior, or senior. Other examples of qualitative variables are the gender of a person, the make of a computer, the opinions of people, and the make of a car. (How would you like this power-point presentation?).
8/29/2017Sample Footer Text 17
8/29/2017Sample Footer Text 18
1.10 A survey of families living in a certain city was conducted to collect information on the following variables: age of the oldest person in the family, number of family members, number of males in the family, number of females in the family, whether or not they own a house, income of the family, whether or not the family took vacations during the past one year, whether or not they are happy with their financial situation, and the amount of their monthly mortgage or rent.
1.10 A survey of families living in a certain city was conducted to collect information on the following variables: age of the oldest person in the family, number of family members, number of males in the family, number of females in the family, whether or not they own a house, income of the family, whether or not the family took vacations during the past one year, whether or not they are happy with their financial situation, and the amount of their monthly mortgage or rent.(a) Which of these variables are qualitative variables?(b) Which of these variables are quantitative variables?(c) Which of the quantitative variables of part b are discrete variables?(d) Which of the quantitative variables of part b are continuous variables?
1.5 POPULATION VERSUS SAMPLE
•Suppose a statistician is interested in knowing the following:•1.The percentage of all voters in a city who will vote for a particular candidate in an election•2.Last year's gross sales of all companies in New York City•3.The prices of all homes in California
We will encounter the terms population and sample on almost every page of the text. Consequently, understanding the meaning of each of these two terms and the difference between them is crucial
Population or Target PopulationA population consists of all elements—individuals, items, or objects—whose characteristics are being studied. The population that is being studied is also called the target population.
POPULATION• A POPULATION CONSISTS OF ALL
ELEMENTS—INDIVIDUALS, ITEMS, OR
OBJECTS—WHOSE CHARACTERISTICS
ARE BEING STUDIED. THE POPULATION
THAT IS BEING STUDIED IS ALSO
CALLED THE TARGET POPULATION.
SAMPLE• A PORTION OF THE POPULATION
SELECTED FOR STUDY IS REFERRED TO
AS A SAMPLE
POPULATION• FOR EXAMPLE, THE ELECTION POLLS
CONDUCTED IN THE UNITED STATES TO
ESTIMATE THE PERCENTAGES OF VOTERS
WHO FAVOR VARIOUS CANDIDATES IN ANY
PRESIDENTIAL ELECTION ARE BASED ON
ONLY A FEW HUNDRED OR A FEW THOUSAND
VOTERS SELECTED FROM ACROSS THE
COUNTRY.
• IN THIS CASE, THE POPULATION CONSISTS OF
ALL REGISTERED VOTERS IN THE UNITED
STATES
SAMPLE• THE SAMPLE IS MADE UP OF A FEW
HUNDRED OR FEW THOUSAND VOTERS
WHO ARE INCLUDED IN AN OPINION
POLL. THUS, THE COLLECTION OF A
NUMBER OF ELEMENTS SELECTED
FROM A POPULATION IS CALLED
A SAMPLE.
CENSUS AND SAMPLE SURVEY
A survey that includes every member of the population is called a census. A survey that includes only a portion of the population is called a sample survey.The purpose of conducting a sample survey is to make decisions about the corresponding population. It is important that the results obtained from a sample survey closely match the results that we would obtain by conducting a census.
CENSUS AND SAMPLE SURVEY
When we collect information on all elements of the target population, it is called a census. Often the target population is very large. Hence, in practice, a census is rarely taken because it is expensive and time-consuming.
Usually, to conduct a survey, we select a sample and collect the required information from the elements included in that sample. We then make decisions based on this sample information
REPRESENTATIVE SAMPLEA SAMPLE THAT REPRESENTS THE CHARACTERISTICS
OF THE POPULATION AS CLOSELY AS POSSIBLE IS CALLED A REPRESENTATIVE SAMPLE.
As an example, to find the average income of families living in New York City by conducting a sample survey, the sample must contain families who belong to different income groups in almost the same proportion as they exist in the population. Such a sample is called a representative sample.
REPRESENTATIVE SAMPLEA SAMPLE THAT REPRESENTS THE CHARACTERISTICS OF THE
POPULATION AS CLOSELY AS POSSIBLE IS CALLED A REPRESENTATIVE SAMPLE.
A sample may be selected with or without replacement. In sampling with replacement, each time we select an element from the population, we put it back in the population before we select the next element.As a result, we may select the same item more than once in such a sample
Sampling without replacement occurs when the selected element is not replaced in the population.In this case, each time we select an item, the size of the population is reduced by one element. Thus, we cannot select the same item more than once in this type of sampling. voter is not selected more than once. Therefore, this is an
example of sampling without replacement.
THREE OF THE MAIN REASONS FOR CONDUCTING A SAMPLE SURVEY INSTEAD OF
A CENSUS
TIME
IN MOST CASES, THE
SIZE OF THE
POPULATION IS QUITE
LARGE.
CONSEQUENTLY,
CONDUCTING A
CENSUS TAKES A LONG
TIME, WHEREAS A
SAMPLE SURVEY CAN
BE CONDUCTED VERY
QUICKLY
COST
THE COST OF
COLLECTING
INFORMATION FROM
ALL MEMBERS OF A
POPULATION MAY
EASILY FALL OUTSIDE
THE LIMITED BUDGET OF
MOST, IF NOT ALL,
SURVEYS, CONDUCTING
A SAMPLE SURVEY MAY
BE THE BEST
APPROACH
IMPOSSIBILITY OF CONDUCTING A CENSUS
SOMETIMES IT IS IMPOSSIBLE TO CONDUCT
A CENSUS
FIRST, IT MAY NOT BE POSSIBLE TO IDENTIFY
AND ACCESS EACH MEMBER OF THE
POPULATION (EX. SURVEY ABOUT
HOMELESS PEOPLE).
SECOND, SOMETIMES CONDUCTING A
SURVEY MEANS DESTROYING THE ITEMS
INCLUDED IN THE SURVEY(EX. TO ESTIMATE
THE MEAN LIFE OF LIGHTBULBS WOULD
NECESSITATE BURNING OUT MANY OF
THEM).
1.5.2 RANDOM AND NONRANDOM SAMPLES
A random sample is a sample drawn in such a way that each member of the population has some chance of being selected in the sample. In a nonrandom sample, some members of the population may not have any chance of being selected in the sample.
RANDOM AND NONRANDOM SAMPLES• SUPPOSE WE HAVE A LIST OF
100 STUDENTS AND WE WANT
TO SELECT 10 OF THEM. IF WE
WRITE THE NAMES OF ALL 100
STUDENTS ON PIECES OF
PAPER, PUT THEM IN A HAT,
MIX THEM, AND THEN DRAW
10 NAMES, THE RESULT WILL
BE A RANDOM SAMPLE OF 10
STUDENTS.
• A RANDOM SAMPLE IS
USUALLY A REPRESENTATIVE
SAMPLE
• HOWEVER, IF WE ARRANGE THE NAMES OF THESE 100 STUDENTS ALPHABETICALLY AND PICK THE FIRST 10 NAMES, IT WILL BE A NONRANDOM SAMPLE BECAUSE THE STUDENTS WHO ARE NOT AMONG THE FIRST 10 HAVE NO CHANCE OF BEING SELECTED IN THE SAMPLE.
• TWO TYPES OF NONRANDOM SAMPLES ARE A CONVENIENCE SAMPLE AND A JUDGMENT SAMPLE
• IN A CONVENIENCE SAMPLE, THE MOST ACCESSIBLE MEMBERS OF THE POPULATION ARE SELECTED TO OBTAIN THE RESULTS QUICKLY. FOR EXAMPLE, AN OPINION POLL MAY BE CONDUCTED IN A FEW HOURS BY COLLECTING INFORMATION FROM CERTAIN SHOPPERS AT A SINGLE SHOPPING MALL.
• IN A JUDGMENT SAMPLE, THE MEMBERS ARE SELECTED FROM THE POPULATION BASED ON THE JUDGMENT AND PRIOR KNOWLEDGE OF AN EXPERT
THE SO-CALLED PSEUDO POLLS ARE EXAMPLES OF NONREPRESENTATIVE SAMPLES
EX. a poll conducted by a television station giving two separate telephone numbers for yes and no votes is not based on a representative sample.
To select such a sample, we divide the target population into different subpopulations based on certain characteristics.As an example of a quota sample, suppose we want to select a sample of 1000 persons from a city whose population has 48% men and 52% women. To select a quota sample, we choose 480 men from the male population and 520 women from the female population. The sample selected in this way will contain exactly 48% men and 52% women
• SAMPLING ERROR
THE SAMPLING ERROR IS THE
DIFFERENCE BETWEEN THE RESULT
OBTAINED FROM A SAMPLE SURVEY
AND THE RESULT THAT WOULD HAVE
BEEN OBTAINED IF THE WHOLE
POPULATION HAD BEEN INCLUDED IN
THE SURVEY.
• NONSAMPLING ERRORS
• THE ERRORS THAT OCCUR IN THE
COLLECTION, RECORDING, AND
TABULATION OF DATA ARE
CALLED NONSAMPLING
ERRORS OR BIASES.
1.5.3 Sampling and Nonsampling Errors
SELECTION ERROR OR BIASTHE LIST OF MEMBERS OF THE TARGET POPULATION THAT IS USED TO SELECT A SAMPLE IS CALLED THE SAMPLING FRAME. THE ERROR THAT OCCURS BECAUSE THE SAMPLING FRAME IS NOT REPRESENTATIVE OF
THE POPULATION IS CALLED THE SELECTION ERROR OR BIAS.Nonresponse Error or BiasThe error that occurs because many of the people included in the sample do not respond to a survey is called the nonresponse error or bias.Response Error or BiasThe response error or bias occurs when people included in the survey do not provide correct answers.Voluntary response error or bias occurs when a survey is not conducted on a randomly selected sample but on a questionnaire published in a magazine or newspaper and people are invited to respond to that questionnaire.
Stratified Random SampleIn a stratified random sample, we first divide the population into subpopulations, which are called strata. Then, one sample is selected from each of these strata. The collection of all samples from all strata gives the stratified random sample.
Suppose we need to select a sample from the population of a city, and we want households with different income levels to be proportionately represented in the sample.For example, we may form three groups of low-, medium-, and high-income households. We will now have three subpopulations, which are usually called strata. We then select one sample from each subpopulation or stratum. The collection of all three samples selected from the three strata gives the required sample, called the stratified random sample.
Cluster SamplingIn cluster sampling, the whole population is first divided into (geographical) groups called clusters. Each cluster is representative of the population. Then a random sample of clusters is selected. Finally, a random sample of elements from each of the selected clusters is selected.divide the population into different geographical groups or clusters and, as a first step, select a random sample of certain clusters from all clusters. We then take a random sample of certain elements from each selected cluster
For example, suppose we are to conduct a survey of households in the state of New York. First, we divide the whole state of New York into, say, 40 regions, which are called clusters or primary units.We make sure that all clusters are similar and, hence, representative of the population.We then select at random, say, 5 clusters from 40. Next, we randomly select certain households from each of these 5 clusters and conduct a survey of these selected households. This is called cluster sampling.
1.13 BRIEFLY EXPLAIN THE TERMS POPULATION, SAMPLE, REPRESENTATIVE SAMPLE, SAMPLING WITH REPLACEMENT, AND SAMPLING WITHOUT
REPLACEMENT.
1.13 BRIEFLY EXPLAIN THE TERMS POPULATION, SAMPLE, REPRESENTATIVE SAMPLE, SAMPLING WITH REPLACEMENT, AND SAMPLING WITHOUT
REPLACEMENT.•A census is a survey that includes every member of the population. A survey based on a portion of the population is called a sample survey. A sample survey is preferred over a census for the following reasons:1.Conducting a census is very expensive because the size of the population is often very large.•2.Conducting a census is very time consuming.•3.In many cases it is impossible to identify each element of the target population.
2
1.16 Explain the following.(a) Random sample(b) Nonrandom sample(c) Convenience sample(d) Judgment sample(e) Quota sample
2
1.16 Explain the following.(a) Random sample(b) Nonrandom sample(c) Convenience sample(d) Judgment sample(e) Quota sample
3
1.17 Explain the following four sampling techniques.(a) Simple random sampling(b) Systematic random sampling(c) Stratified random sampling(d) Cluster sampling
31.17 Explain the following four sampling techniques.(a) Simple random sampling(b) Systematic random sampling (c) Stratified random sampling (d) Cluster sampling a) Simple random sampling (b) Systematic random sampling (c) In a stratified random sample, we first divide the population into subpopulations which are called strata. Then, one sample is selected from each of these strata. The collection of all samples from all strata gives the stratified random sample.In cluster sampling, the whole population is divided into (geographical) groups called clusters. Each cluster is representative of the population. Then, a random sample of clusters is selected. Finally, a random sample of elements of each of the selected clusters is selected.
I
1.5.1 WHY SAMPLE?