+ All Categories
Home > Documents > Lecture 3 Dustin Lueker. Simple Random Sampling (SRS) ◦ Each possible sample has the same...

Lecture 3 Dustin Lueker. Simple Random Sampling (SRS) ◦ Each possible sample has the same...

Date post: 31-Dec-2015
Category:
Upload: silvester-jenkins
View: 227 times
Download: 4 times
Share this document with a friend
Popular Tags:
29
STA 291 Summer 2010 Lecture 3 Dustin Lueker
Transcript
Page 1: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

STA 291Summer 2010

Lecture 3Dustin Lueker

Page 2: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

Simple Random Sampling (SRS)◦ Each possible sample has the same probability of being selected

Stratified Random Sampling◦ The population can be divided into a set of non-overlapping

subgroups (the strata)◦ SRSs are drawn from each strata

Cluster Sampling◦ The population can be divided into a set of non-overlapping

subgroups (the clusters)◦ The clusters are then selected at random, and all individuals in

the selected clusters are included in the sample Systematic Sampling

◦ Useful when the population consists as a list◦ A value K is specified. Then one of the first K individuals is

selected at random, after which every Kth observation is included in the sample

Sampling Plans

STA 291 Summer 2010 Lecture 3 2

Page 3: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

STA 291 Summer 2010 Lecture 3 3

Descriptive Statistics Summarize data

◦ Condense the information from the dataset Graphs Table Numbers

Interval data◦ Histogram

Nominal/Ordinal data◦ Bar chart◦ Pie chart

Page 4: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

Difficult to see the “big picture” from these numbers◦ We want to try to condense the data

Data Table: Murder Rates

STA 291 Summer 2010 Lecture 3 4

Alabama 11.6 Alaska 9.0

Arizona 8.6 Arkansas 10.2

California 13.1 Colorado 5.8

Connecticut 6.3 Delaware 5.0

D C 78.5 Florida 8.9

Georgia 11.4 Hawaii 3.8

… …

Page 5: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

STA 291 Summer 2010 Lecture 3 5

Frequency Distribution A listing of intervals of possible values for a

variable Together with a tabulation of the number of

observations in each interval.

Page 6: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

Frequency Distribution

STA 291 Summer 2010 Lecture 3 6

Murder Rate Frequency

0-2.9 5

3-5.9 16

6-8.9 12

9-11.9 12

12-14.9 4

15-17.9 0

18-20.9 1

>21 1

Total 51

Page 7: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

Conditions for intervals◦ Equal length◦ Mutually exclusive

Any observation can only fall into one interval◦ Collectively exhaustive

All observations fall into an interval Rule of thumb:

◦ If you have n observations then the number of intervals should approximately

Frequency Distribution

STA 291 Summer 2010 Lecture 3 7

n

Page 8: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

STA 291 Summer 2010 Lecture 3 8

Relative Frequencies Relative frequency for an interval

◦ Proportion of sample observations that fall in that interval Sometimes percentages are preferred to relative

frequencies

Page 9: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

Frequency, Relative Frequency, and Percentage Distribution

STA 291 Summer 2010 Lecture 3 9

Murder Rate Frequency Relative Frequency

Percentage

0-2.9 5 .10 10

3-5.9 16 .31 31

6-8.9 12 .24 24

9-11.9 12 .24 24

12-14.9 4 .08 8

15-17.9 0 0 0

18-20.9 1 .02 2

>21 1 .02 2

Total 51 1 100

Page 10: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

STA 291 Summer 2010 Lecture 3 10

Frequency Distributions Notice that we had to group the

observations into intervals because the variable is measured on a continuous scale◦ For discrete data, grouping may not be necessary

Except when there are many categories Intervals are sometimes called classes

◦ Class Cumulative Frequency Number of observations that fall in the class and in

smaller classes◦ Class Relative Cumulative Frequency

Proportion of observations that fall in the class and in smaller classes

Page 11: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

Frequency and Cumulative Frequency

STA 291 Summer 2010 Lecture 3 11

Murder Rate Frequency Relative Frequency

CumulativeFrequency

RelativeCumulative Frequency

0-2.9 5 .10 5 .10

3-5.9 16 .31 21 .41

6-8.9 12 .24 33 .65

9-11.9 12 .24 45 .89

12-14.9 4 .08 49 .97

15-17.9 0 0 49 .97

18-20.9 1 .02 50 .99

>21 1 .02 51 1

Total 51 1 51 1

Page 12: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

STA 291 Summer 2010 Lecture 3 12

Histogram (Interval Data) Use the numbers from the frequency

distribution to create a graph◦ Draw a bar over each interval, the height of the

bar represents the relative frequency for that interval

◦ Bars should be touching Equally extend the width of the bar at the upper and

lower limits so that the bars are touching.

Page 13: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

STA 291 Summer 2010 Lecture 3 13

Histogram

Page 14: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

STA 291 Summer 2010 Lecture 3 14

Histogram w/o DC

Page 15: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

STA 291 Summer 2010 Lecture 3 15

Bar Graph (Nominal/Ordinal Data) Histogram: for interval (quantitative) data Bar graph is almost the same, but for

qualitative data Difference:

◦ The bars are usually separated to emphasize that the variable is categorical rather than quantitative

◦ For nominal variables (no natural ordering), order the bars by frequency, except possibly for a category “other” that is always last

Page 16: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

First Step◦ Create a frequency distribution

Pie Chart(Nominal/Ordinal Data)

STA 291 Summer 2010 Lecture 3 16

Highest Degree Obtained

Frequency(Number of Employees)

Grade School 15

High School 200

Bachelor’s 185

Master’s 55

Doctorate 70

Other 25

Total 550

Page 17: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

Bar graph◦ If the data is ordinal, classes are presented in the

natural ordering

We could display this data in a bar chart…

STA 291 Summer 2010 Lecture 3 17

Grade School

High School Bachelor's Master's Doctorate Other0

50

100

150

200

250

Page 18: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

Pie is divided into slices◦ Area of each slice is proportional to the frequency

of each class

Pie Chart

STA 291 Summer 2010 Lecture 3 18

Highest Degree Relative Frequency Angle ( = Rel. Freq. x 360 )

Grade School 15/550 = .027 9.72

High School 200/550 = .364 131.04

Bachelor’s 185/550 = .336 120.96

Master’s 55/550 = .1 36.0

Doctorate 70/550 = .127 45.72

Other 25/550 = .045 16.2

Page 19: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

Pie Chart for Highest Degree Achieved

STA 291 Summer 2010 Lecture 3 19

Grade School

High School

Bache-lor's

Master's

DoctorateOther

Page 20: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

20

Write the observations ordered from smallest to largest◦ Looks like a histogram sideways◦ Contains more information than a histogram,

because every single observation can be recovered Each observation represented by a stem and leaf

Stem = leading digit(s) Leaf = final digit

Stem and Leaf Plot

STA 291 Summer 2010 Lecture 3 20

Page 21: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

21

Stem and Leaf Plot

STA 291 Summer 2010 Lecture 3 21

Stem Leaf # 20 3 1 19 18 17 16 15 14 13 135 3 12 7 1 11 334469 6 10 2234 4 9 08 2 8 03469 5 7 5 1 6 034689 6 5 0238 4 4 46 2 3 0144468999 10 2 039 3 1 67 2 ----+----+----+----+

Page 22: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

22

Useful for small data sets◦ Less than 100 observations

Can also be used to compare groups◦ Back-to-Back Stem and Leaf Plots, using the same

stems for both groups. Murder Rate Data from U.S. and Canada

◦ Note: it doesn’t really matter whether the smallest stem is at top or bottom of the table

Stem and Leaf Plot

STA 291 Summer 2010 Lecture 3 22

Page 23: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

23

Stem and Leaf Plot

STA 291 Summer 2010 Lecture 3 23

PRESIDENT AGE PRESIDENT AGE PRESIDENT AGEWashington 67 Fillmore 74 Roosevelt 60

Adams 90 Pierce 64 Taft 72Jefferson 83 Buchanan 77 Wilson 67Madison 85 Lincoln 56 Harding 57Monroe 73 Johnson 66 Coolidge 60Adams 80 Grant 63 Hoover 90Jackson 78 Hayes 70 Roosevelt 63Van Buren 79 Garfield 49 Truman 88Harrison 68 Arthur 56 Eisenhower 78Tyler 71 Cleveland 71 Kennedy 46Polk 53 Harrison 67 Johnson 64Taylor 65 McKinley 58 Nixon 81

Reagan 93Ford 93Stem Leaf

Page 24: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

24

Discrete data◦ Frequency distribution

Continuous data◦ Grouped frequency distribution

Small data sets◦ Stem and leaf plot

Interval data◦ Histogram

Categorical data◦ Bar chart◦ Pie chart

Grouping intervals should be of same length, but may be dictated more by subject-matter considerations

Summary of Graphical and Tabular Techniques

STA 291 Summer 2010 Lecture 3 24

Page 25: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

25

Present large data sets concisely and coherently

Can replace a thousand words and still be clearly understood and comprehended

Encourage the viewer to compare two or more variables

Do not replace substance by form Do not distort what the data reveal

Good Graphics

STA 291 Summer 2010 Lecture 3 25

Page 26: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

26

Don’t have a scale on the axis Have a misleading caption Distort by using absolute values where

relative/proportional values are more appropriate

Distort by stretching/shrinking the vertical or horizontal axis

Use bar charts with bars of unequal width

Bad Graphics

STA 291 Summer 2010 Lecture 3 26

Page 27: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

27

Frequency distributions and histograms exist for the population as well as for the sample

Population distribution vs. sample distribution

As the sample size increases, the sample distribution looks more and more like the population distribution◦ This will be explored further later on in the course

Sample/Population Distribution

STA 291 Summer 2010 Lecture 3 27

Page 28: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

28

The population distribution for a continuous variable is usually represented by a smooth curve◦ Like a histogram that gets finer and finer

Similar to the idea of using smaller and smaller rectangles to calculate the area under a curve when learning how to integrate

Symmetric distributions◦ Bell-shaped◦ U-shaped◦ Uniform

Not symmetric distributions:◦ Left-skewed◦ Right-skewed◦ Skewed

Population Distribution

STA 291 Summer 2010 Lecture 3 28

Page 29: Lecture 3 Dustin Lueker.  Simple Random Sampling (SRS) ◦ Each possible sample has the same probability of being selected  Stratified Random Sampling.

Symmetric

Right-skewed

Left-skewed

Skewness

STA 291 Summer 2010 Lecture 3 29


Recommended