+ All Categories
Home > Documents > SESSION 5 & 6

SESSION 5 & 6

Date post: 22-Feb-2016
Category:
Upload: tasha
View: 44 times
Download: 0 times
Share this document with a friend
Description:
SESSION 5 & 6. Last Update 23 rd February 2011. Introduction to Statistics. Learning Objectives Part 1. Sampling (Random Sampling) Sampling Error Nonsampling Error. Sampling. Why? Cost! The sample proportions are used as an estimate for the population proportions Examples: - PowerPoint PPT Presentation
28
SESSION 5 & 6 Last Update 23 rd February 2011 Introduction to Statistics
Transcript
Page 1: SESSION 5 & 6

SESSION 5 & 6

Last Update23rd February 2011

Introduction to Statistics

Page 2: SESSION 5 & 6

Lecturer: Florian BoehlandtUniversity: University of Stellenbosch Business SchoolDomain: http://www.hedge-fund-analysis.net/pages/ve

ga.php

Page 3: SESSION 5 & 6

Learning Objectives Part 1

1. Sampling (Random Sampling)2. Sampling Error3. Nonsampling Error

Page 4: SESSION 5 & 6

Sampling

• Why? Cost!• The sample proportions are used as an

estimate for the population proportions• Examples:

– Nielsen ratings (1,000 television viewers)– Quality Management (destroy items?)

Page 5: SESSION 5 & 6

Terminology

• Target Population: the population about which statisticians want to draw inferences

• Sampled Population: The actual population from which the sample is taken

• The sample statistic is a good estimator of the population parameter if target population = sampled population

Page 6: SESSION 5 & 6

Terminology

• Self-selected samples are always biased, because individuals who participate are more keenly interested in the issue than non-participants (SLOP = self-selected opinion poll)

Page 7: SESSION 5 & 6

Sampling Plan

• A simple random sample is a sample selected in such a way that every possible sample with the same number of observations is equally likely to be chosen.

• A stratified random sample is obtained by separating he population into mutually exclusive sets (strata), and then drawing simple random samples form each stratum.

Page 8: SESSION 5 & 6

Sampling Plan

• A cluster sample is a simple random sample of groups or cluster of elements

Page 9: SESSION 5 & 6

Simple Random Sampling

• Concept: Raffles each element of the chosen population is assigned a unique number and then ‘drawn from a hat’+ Social security numbers+ Student numbers– Telephone numbers

• A random number table / random number generator (Excel: RAND) can be used to select sample numbers.

Page 10: SESSION 5 & 6

Simple Random Sampling

• Example Tax Returns (Keller 2006: p. 148)

Page 11: SESSION 5 & 6

Stratified Random Sampling

• Concept: Increase the amount of information aboiut the population

• Examples of criteria separating the population into strata:– Gender– Age– Occupation– Household Income

Page 12: SESSION 5 & 6

Stratified Random Sampling

• Example Proposed Tax Increase:1. Draw random samples form four income groups

according to their proportions in the population2. Make adjustments before making inferences about the

entire populationStratum Income ‘000s Population % Sample

1 Under 25 25% 250

2 25-49 40% 400

3 50-75 30% 300

4 Over 75 5% 50

Total 1,000

Page 13: SESSION 5 & 6

Systematic Sampling

• Concept: sample members are chosen in a regular manner working progressively through the list

• Example Vega students:500 students from Vega’s 8,500 enrolled students: 8,500 / 500 = 17. Thus, every 17th student would be selected

Page 14: SESSION 5 & 6

Cluster Sampling

• Concept: Useful when it is difficult or costly to develop a complete list of population members (i.e. making it difficult to draw a simple random sample) or when the population elements are widely dispersed (geographically)

• Example: Each block within a city represents a cluster. A sample of clusters could then be selected and every household within these clusters is questioned (sampling error? sample size)

Page 15: SESSION 5 & 6

Sampling Error• Sampling error refers to the differences between the

sample and the population that exist because of the observations that happened to be selected for the sample. The value of the sample mean will deviate from the population mean simply by chance

• The difference between the true (unknown) value of the population mean μ and its estimate (the sample mean x-bar) is the sampling error

• The only way to reduce the sampling error is to increase the sample size n

Page 16: SESSION 5 & 6

Nonsampling Error

• Nonsampling errors are due to mistakes made in the acquisition of data or due to the sample observations being selected improperly

• Nonsampling errors are more serious than sampling errors, because taking a larger sample won’t diminish the size, or possibilty of occurrence, of this error

Page 17: SESSION 5 & 6

Types of Nonsampling Error• Errors in data acquisition: incorrect

measurements/responses, inaccurate recording• Nonresponse error: refers to bias introduced when

responses are not obtained from some members of the sample (not representative of target population); self-administered surveys

• Selection bias: Some members of the target population cannot possibly be included in the sample (e.g. members have no phone)

Page 18: SESSION 5 & 6

Learning Objectives Part 2

4. Frequency Tables5. Histograms6. Class Intervals and Width

Page 19: SESSION 5 & 6

Frequency Tables – Data Types

Interval Data Ordinal DataNominal Data

CategoriesClass IntervalsCount the number of times

each category of the variable occurs

Count the number of observations that fall into

each of a series of intervals

Frequency Distribution

Frequency Distribution

Histogram Bar Chart

Page 20: SESSION 5 & 6

Frequency Tables – Data Types• There are times when a data set contains a large

number of values (even when the data type is nominal) that would result in a table with too many rows to be convenient. We can overcome this problem by grouping the data into fewer categories or classes and then compiling a grouped frequency distribution.

Page 21: SESSION 5 & 6

Frequency Tables – Data Types

Ungrouped Data

CategoriesClass IntervalsCount the number of times

each category of the variable occurs

Count the number of observations that fall into

each of a series of intervals

Frequency Distribution

Frequency Distribution

Histogram Bar Chart

Grouped Data

Page 22: SESSION 5 & 6

Frequency Tables – Data Types• Example 1: Coffee refills

Data type nominal; Data ungrouped Categories• Example 2: Class marks out of 100

Data type nominal; BUT: Data may be grouped Class intervals (approximately interval)

• Example 3: Waiting times at supermarket cashiersData type interval Class intervals

Page 23: SESSION 5 & 6

Number of Categories

Nominal / not grouped:1. Determine maximum and minimum observation2. Define categories including all distinct (integer)

observations in betweenExample tossing two dice:Min: 2Max: 12Other possible outcomes: 2 4 5 6 7 8 9 10 11 (all outcomes accounted for)

Page 24: SESSION 5 & 6

Number of Class Intervals

Interval or grouped data:The more observations there are the larger the number of class intervals required. Sturges’ FormulaNumber of class intervals = 1 + 3.3 log10(n) ORNumber of class intervals = 1 + 1.4 ln(n)Example n = 50:Number of class intervals = 1 + 3.3 log10 (50) = 1 + 3.3 * 1.70 = 6.61 ≈ 7Number of class intervals = 1 + 1.4 ln(50) = 1 + 1.4 * 3.91 = 6.48 ≈ 6

Page 25: SESSION 5 & 6

Excursion Logarithms

• The logarithm of a number to a given base is the exponent to which the base must be raised in order to produce that number. (Example: 10^1.70 = 50)

• The natural logarithm is the logarithm to the base e, where e is an irrational constant approximately equal to 2.718. The natural logarithm of a number x (written as ln(x)) is the power to which e would have to be raised to equal x. (Example: e^3.91 = 50)

• The mathematical constant e (Euler’s number) is the unique real number such that the value of the derivative d/dx (slope of the tangent line) of the function f(x) = ex at the point x = 0 is equal to 1. It is called the exponential function.

Page 26: SESSION 5 & 6

Class Interval Width

Class width:1. Subtract largest observation from smallest

observation2. Divide by number of classes (Sturges)3. Round class width to convenient value4. Select a lower limit so that the first class interval

contains the smallest observation. Determine all other intervals consecutively by adding (multiples) of the class width

Page 27: SESSION 5 & 6

Definitions• Class Mark or Class Midpoint: Adding the lower class

limits to the upper class limits and dividing by two frequency polygon

• Width of a class interval or class length: The difference between the upper class limit and the lower class limit. Usually, all classes are of equal width / length (Sturges)

• Class Boundaries: The class limits are stated in such a way that there is no overlap between classes

Page 28: SESSION 5 & 6

DefinitionsClass Boundaries: The class limits are stated in such a way that there is no overlap between classes. Limits are stated in this manner so that there cannot be any doubt as to which class a certain value (observation) is to be allocated. Since data is often rounded, the true class limits are not the same as the stated class limits.Example:Weights recorded to the nearest kilogramStated class interval: 60 – 62True class interval or class boundaries: 59.5 – 62.5


Recommended