Lecture I - math.kent.edu › ~oana › math10041 › 2012InterssesionLecture1… · it is to be...

Lecture 1 1

Lecture I

Definition 1. Statistics is the science of

collecting, organizing, summarizing and analyzing

the information in order to draw conclusions.

It is a process consisting of 3 parts.

May 14, 2012

Lecture 1 2

First part: collecting data

• Identify the research objective: the group that

it is to be study is called population. A member

of the population is called individual, element

or subject.

• Collect the information needed to answer the

questions posed: typically look at a subset of the

population called sample.

Example: what is a placebo, experimental group,

treatment, control group, double-blind

experiment, single-blind experiment

May 14, 2012

Lecture 1 3

Second part: Analyze and present

the information

This step is called descriptive statistics, or

exploratory data analysis. Uses tables, charts,

graphs, etc to describe the data collected.

Third part: Draw conclusions from

the information

This part is called inferential statistics.

We can not learn everything about the population

just by looking at a sample!!! But we might be

able to say something with a certain level of

confidence.

May 14, 2012

Lecture 1 4

Types of data

Definition 2. The characteristics, that we

decided we are interested to study, of the

individual within the population are called

variables.

Definition 3. A characteristic of a population is

called a parameter, while a characteristic of a

sample is called a statistics

Definition 4. An observation is the set of

values of the variables for a given individual.

May 14, 2012

Lecture 1 5

Variables can be classified into two groups:

Definition 5. Qualitative or categorical

variables allow for classification of individuals

based on some attribute or characteristics.

Quantitative variables provide numerical

measures of individuals. Arithmetic operations

can be performed on the values of a quantitative

variable and provide meaningful results.

Examples:

The distribution of a variable tells us what

values it takes and how often it takes these values.

May 14, 2012

Lecture 1 6

Quantitative variables can be classified into two

types:

Definition 6. A discrete variable is a

quantitative variable whose possible values could

be counted: 0,1,2,3,4,5.

Examples:

A continuous variable is a quantitative variable

that has an infinite number of possible values that

are not countable.

Examples:

The list of observations a variable assumes is

called data. Data could be classified in the same

May 14, 2012

Lecture 1 7

categories as variables.

May 14, 2012

Lecture 1 8

Data can be obtained from four sources:

1. A census

2. Existing sources

3. Survey sampling

4. Designed experiments

May 14, 2012

Lecture 1 9

Definition 7. A census is a list of all

individuals in a population along with certain

characteristics of each individual.

Existing data: Don’t collect data that have

already been collected.

Survey sampling is used when no attempt to

influence the value of the variable of interest.

Examples: Polling, ....

Data obtained from a survey sample lead to an

observational study. Sometimes it is referred

to as expost facto (after the fact) studies because

the value of the variable of interest has already

May 14, 2012

Lecture 1 10

been established.

May 14, 2012

Lecture 1 11

A designed experiment (or experimental

study) applies a treatment to individuals

(referred to as experimental units) and

attempts to isolate the effects of the treatment on

a response variable.

Observational studies are very useful tools for

determining whether there is a relation between

two variables, but it requires a design experiment

to isolate the cause of the relation.

If control is possible, an experiment should be

performed. If control is not possible or necessary,

then observational studies are appropriate.

May 14, 2012

Lecture 1 12

The design of experiments

We will discuss obtaining data through an

experiment

A designed experiment is a controlled study in

which one or more treatments are applied to

experimental units. The experimenter then

observes the effect of varying these treatments on

a response variable. Control, manipulation,

randomization, and replication are the key

ingredients of a well-designed experiment.

The experimental unit, or the subject is the

equivalent of the individual in the sample. It is a

well-defined item upon which a treatment

(condition) is applied.

May 14, 2012

Lecture 1 13

A response variable is a quantitative or

qualitative variable that represents our variable of

interest.

A predictor variable is a characteristic

purported to explain differences in the response

variable.

May 14, 2012

Lecture 1 14

Sampling

How can a researcher obtain accurate information

about the population through the sample while

minimizing the costs?

There are 5 types of sampling:

• simple random sampling

• stratified sampling

• systematic sampling

• cluster sampling

• convenience sampling

In the first four cases the sampling methods are

based on the planned randomness techniques.

The surveyor does not have a choice as to who is

in the study.

May 14, 2012

Lecture 1 15

Simple random sampling

Definition 8. If the population is of size N and

we want a sample of size n (n < N), a simple

random sampling is obtained if every possible

sample of size n has an equally likely chance of

occurring. The sample is then called a simple

random sample.

Examples:

May 14, 2012

Lecture 1 16

How do we obtain such a sample?

1. using a hat if the population is small!

2. using random number if the population is

large:

(a) number the individuals in the population,

from 1 to N . (that means that we have to

have the frame-the list of all individuals

in the population!

(b) select n random numbers from this list

using a table of random numbers or using

your calculator.

May 14, 2012

Lecture 1 17

Using the table:

• Select a starting point.

• Look for numbers that have as many digits as

N has.

• If a number is repeated, discard it.

• If a number is larger than N discard it.

• Stop when you obtain n numbers.

May 14, 2012

Lecture 1 18

Stratified Sampling

Definition 9. A stratified sample is obtained

by separating the population into nonoverlapping

groups called strata and then obtaining a simple

random sample from each stratum. The

individuals within stratum should be

homogeneous (or similar) in some way.

May 14, 2012

Lecture 1 19

Definition 10. A systematic sample is

obtained by selecting every kth individual from

the population. The first individual selected is a

random number between 1 and k.

• Does not require a frame!

• How do we obtain a systematic sample without

a frame? How do we establish k?

May 14, 2012

Lecture 1 20

Cluster sampling

Definition 11. A cluster sample is obtained

by selecting all individuals within a randomly

selected collection or group of individuals.

How do we obtain a cluster sampling?

• randomly select the cluster (using random

sampling for example)

• survey all the individuals in the clusters.

Other questions:

• How do I cluster the population?

• How many individuals in a cluster?

• How many clusters do I sample?

May 14, 2012

Lecture 1 21

Sources of error in sampling

There are two types of errors:

• Sampling errors

• Nonsampling errors

Sampling error is the error that results from

using sampling to estimate information regarding

a population. This type of error occurs because a

sample gives incomplete information about the

population.

We can control the amount of sampling error

through an appropriately designed survey of

experiment.

May 14, 2012

Lecture 1 22

Nonsampling errors or selection bias are

errors that result from the survey process. They

are very difficult to control. Exaples:1)

Incomplete frame (certain segments of the

population are underrepresented)

2) Nonresponse of the individuals selected

3) Inaccurate responses (trained interviewers are

needed to avoid this)

4) Data entrance errors

5) Biase in the selection of individuals

6) Poorly designed questions: Do you use an open

question or a closed question?

7) Poorly worded question: the question needs to

be balance, not vague, and with the right order of

the words.

May 14, 2012

Lecture 1 23

Organizing Categorial Data

We are interested in the number of individuals

that occur in each category.

Definition 12. A frequancy distribution lists

the number of occurances (or the count) for each

category of data. The relative frequency is the

proportion or percent of observations within

each category and is found using the formula

Relative frequency =frequency

sum of all frequencies

A relative frequency distribution lists the

relative frequency of each category of data.

Examples:

May 14, 2012

Lecture 1 24

Definition 13. A bar graph is constructed by

labeling each category of data on the horizontal

axis and the freequency or relative frequency of

the category on the vertical axis. A rectangle of

equal width is drawn for each category. The

height of the rectangle is equal to the category’s

frequency or relative frequency.

A Pareto chart is a bar graph whose bars are

drawn in decreasing order of frequency or relative

frequency.

Definition 14. A side-by-side bar graph is

used when we want to compare two sets of data.

Carefull!: We should use relative frequencies

when drawing a side-by-side chart!!! (Why?)

May 14, 2012

Lecture 1 25

Examples:

May 14, 2012

Lecture 1 26

Definition 15. A pie chart is a circle divided

into sectors. Each sector represents a category of

data. The area of each sector is proportional to

the frequency of the category.

Remark: The size of the angle of the sectors of

the pie chart is given by

percetange × 360◦

May 14, 2012

Date post:	04-Jul-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Lecture I - math.kent.edu › ~oana › math10041 › 2012InterssesionLecture1… · it is to be...

Documents