+ All Categories
Home > Documents > Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data...

Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data...

Date post: 24-Sep-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
136
U 1: I L 1: D , , S 101 Nicole Dalzell Duke University June 1, 2014
Transcript
Page 1: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

U 1: I L 1: D , ,

S 101

Nicole DalzellDuke University

June 1, 2014

Page 2: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data

1 Introduction to DataObservations and variablesTypes of variables

2 Overview of data collection principlesScientific InquiryPopulations and SamplesSampling from a populationSampling biasObservational studies and experiments

3 Observational DataCereal breakfastSampling methods

4 ExperimentsPrinciples of experimental design

5 Recap6 Syllabus & policies

LogisticsGoals and topicsDetailsSupportPoliciesTips

7 To do

Sta 101

U1 - L1: Data coll., obs. studies, experiments N.Dalzell– Duke University

Page 3: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data

Statistics and Data

Statistics is the art and science of making inferences from data.It is the study of how best to collect, analyze and drawconclusions from data.

1 Collect Data2 Describe Data (Visualization, Numerical Summaries)3 Analyze Data

Data are a set of measurements taken on a set of individualunits.

We often store and present data in data sets , comprised ofvariables measured on individual cases.

However, there are other ways to visualize data...

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 2 / 60

Page 4: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data

Statistics and Data

Statistics is the art and science of making inferences from data.It is the study of how best to collect, analyze and drawconclusions from data.

1 Collect Data2 Describe Data (Visualization, Numerical Summaries)3 Analyze Data

Data are a set of measurements taken on a set of individualunits.

We often store and present data in data sets , comprised ofvariables measured on individual cases.

However, there are other ways to visualize data...

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 2 / 60

Page 5: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data

Statistics and Data

Statistics is the art and science of making inferences from data.It is the study of how best to collect, analyze and drawconclusions from data.

1 Collect Data2 Describe Data (Visualization, Numerical Summaries)3 Analyze Data

Data are a set of measurements taken on a set of individualunits.

We often store and present data in data sets , comprised ofvariables measured on individual cases.

However, there are other ways to visualize data...

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 2 / 60

Page 6: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data

Statistics and Data

Statistics is the art and science of making inferences from data.It is the study of how best to collect, analyze and drawconclusions from data.

1 Collect Data2 Describe Data (Visualization, Numerical Summaries)3 Analyze Data

Data are a set of measurements taken on a set of individualunits.

We often store and present data in data sets , comprised ofvariables measured on individual cases.

However, there are other ways to visualize data...

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 2 / 60

Page 7: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data

Map based on Flickr tags

Red: Tourists

Blue: Locals

Yellow: Either

http:// www.flickr.com/ photos/ walkingsf/ 4671594023/ in/set-72157624209158632/

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 3 / 60

Page 8: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Observations and variables

Observations and variables

datamatrix⇒

variable↓

type price · · · weight

1 small 15.9 · · · 27052 midsize 33.9 · · · 3560 ← observation...

......

......

54 midsize 26.7 · · · 3245

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 4 / 60

Page 9: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables

all variables

numerical categorical

continuous discreteregular

categorical ordinal

measured counted unorderedcategories

orderedcategories

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 5 / 60

Page 10: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables (cont.)

type: small, midsize or large.

price: average price in $1000’s

mpgCity: cite mileage per gallon

drivetrain: front, rear, 4WD

passengers: passenger capacity

weight: car weight in pounds

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 6 / 60

Page 11: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal)

price: average price in $1000’s

mpgCity: cite mileage per gallon

drivetrain: front, rear, 4WD

passengers: passenger capacity

weight: car weight in pounds

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 6 / 60

Page 12: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal)

price: average price in $1000’s

mpgCity: cite mileage per gallon

drivetrain: front, rear, 4WD

passengers: passenger capacity

weight: car weight in pounds

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 6 / 60

Page 13: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal)

price: average price in $1000’s (numerical, continuous)

mpgCity: cite mileage per gallon

drivetrain: front, rear, 4WD

passengers: passenger capacity

weight: car weight in pounds

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 6 / 60

Page 14: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal)

price: average price in $1000’s (numerical, continuous)

mpgCity: cite mileage per gallon

drivetrain: front, rear, 4WD

passengers: passenger capacity

weight: car weight in pounds

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 6 / 60

Page 15: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal)

price: average price in $1000’s (numerical, continuous)

mpgCity: cite mileage per gallon (numerical, continuous)

drivetrain: front, rear, 4WD

passengers: passenger capacity

weight: car weight in pounds

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 6 / 60

Page 16: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal)

price: average price in $1000’s (numerical, continuous)

mpgCity: cite mileage per gallon (numerical, continuous)

drivetrain: front, rear, 4WD

passengers: passenger capacity

weight: car weight in pounds

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 6 / 60

Page 17: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal)

price: average price in $1000’s (numerical, continuous)

mpgCity: cite mileage per gallon (numerical, continuous)

drivetrain: front, rear, 4WD (categorical)

passengers: passenger capacity

weight: car weight in pounds

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 6 / 60

Page 18: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal)

price: average price in $1000’s (numerical, continuous)

mpgCity: cite mileage per gallon (numerical, continuous)

drivetrain: front, rear, 4WD (categorical)

passengers: passenger capacity

weight: car weight in pounds

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 6 / 60

Page 19: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal)

price: average price in $1000’s (numerical, continuous)

mpgCity: cite mileage per gallon (numerical, continuous)

drivetrain: front, rear, 4WD (categorical)

passengers: passenger capacity (numerical, discrete)

weight: car weight in pounds

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 6 / 60

Page 20: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal)

price: average price in $1000’s (numerical, continuous)

mpgCity: cite mileage per gallon (numerical, continuous)

drivetrain: front, rear, 4WD (categorical)

passengers: passenger capacity (numerical, discrete)

weight: car weight in pounds

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 6 / 60

Page 21: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal)

price: average price in $1000’s (numerical, continuous)

mpgCity: cite mileage per gallon (numerical, continuous)

drivetrain: front, rear, 4WD (categorical)

passengers: passenger capacity (numerical, discrete)

weight: car weight in pounds (numerical, continuous)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 6 / 60

Page 22: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Introduction to Data Types of variables

Types of variables (cont.)

type: small, midsize or large. (categorical, ordinal)

price: average price in $1000’s (numerical, continuous)

mpgCity: cite mileage per gallon (numerical, continuous)

drivetrain: front, rear, 4WD (categorical)

passengers: passenger capacity (numerical, discrete)

weight: car weight in pounds (numerical, continuous)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 6 / 60

Page 23: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles

1 Introduction to DataObservations and variablesTypes of variables

2 Overview of data collection principlesScientific InquiryPopulations and SamplesSampling from a populationSampling biasObservational studies and experiments

3 Observational DataCereal breakfastSampling methods

4 ExperimentsPrinciples of experimental design

5 Recap6 Syllabus & policies

LogisticsGoals and topicsDetailsSupportPoliciesTips

7 To do

Sta 101

U1 - L1: Data coll., obs. studies, experiments N.Dalzell– Duke University

Page 24: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Process of Scientific Inquiry

Statistics is the art and science of making inferences from data.It is the study of how best to collect, analyze and drawconclusions from data.

So, how do we proceed?Four steps:

1 Identify a hypothesis or research question2 Collect relevant data3 Analyze the data4 Form a conclusion

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 7 / 60

Page 25: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Process of Scientific Inquiry

Statistics is the art and science of making inferences from data.It is the study of how best to collect, analyze and drawconclusions from data.

So, how do we proceed?Four steps:

1 Identify a hypothesis or research question2 Collect relevant data3 Analyze the data4 Form a conclusion

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 7 / 60

Page 26: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Identify a Hypothesis

A well formed hypothesis will clearly identify a population andassociated parameters of interest.

Population: group of individuals or subjects to whom we can makeinference.Parameters: “True” values of characteristics in the population wewant to study.

Example Research Question:Do most university faculty in the United States considerthemselves to be Republicans?

Population ?Parameter ?

http:// www.studentsforacademicfreedom.org/ news/ 1898/ lackdiversity.html

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 8 / 60

Page 27: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Identify a Hypothesis

A well formed hypothesis will clearly identify a population andassociated parameters of interest.

Population: group of individuals or subjects to whom we can makeinference.Parameters: “True” values of characteristics in the population wewant to study.

Example Research Question:Do most university faculty in the United States considerthemselves to be Republicans?

Population ?Parameter ?

http:// www.studentsforacademicfreedom.org/ news/ 1898/ lackdiversity.html

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 8 / 60

Page 28: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Collect Data

A sample is the group of individuals taken from the population.The number of individuals in the sample is usually denoted withthe letter n. We record the value of several variables for eachindividual in the sample. A statistic is any function of the datacollected in the sample (e.g., mean, median, etc).

Example Data Collection:To answer their research question, the researchers took arandom sample of 100 Duke faculty. They calculated thepercentage of faculty who said that they are Republican.

n ?Statistic?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 9 / 60

Page 29: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Collect Data

A sample is the group of individuals taken from the population.The number of individuals in the sample is usually denoted withthe letter n. We record the value of several variables for eachindividual in the sample. A statistic is any function of the datacollected in the sample (e.g., mean, median, etc).

Example Data Collection:To answer their research question, the researchers took arandom sample of 100 Duke faculty. They calculated thepercentage of faculty who said that they are Republican.

n ?Statistic?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 9 / 60

Page 30: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Populations and samples

http:// well.blogs.nytimes.com/ 2012/ 08/ 29/

finding-your-ideal-running-form

Research question: Can peoplebecome better, more efficientrunners on their own, merely byrunning?

Population of interest: All people

Sample: Group of adult women who recently joined a running groupPopulation to which results can be generalized: Adult women, if thedata are randomly sampled

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 10 / 60

Page 31: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Populations and samples

http:// well.blogs.nytimes.com/ 2012/ 08/ 29/

finding-your-ideal-running-form

Research question: Can peoplebecome better, more efficientrunners on their own, merely byrunning?Population of interest:

All people

Sample: Group of adult women who recently joined a running groupPopulation to which results can be generalized: Adult women, if thedata are randomly sampled

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 10 / 60

Page 32: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Populations and samples

http:// well.blogs.nytimes.com/ 2012/ 08/ 29/

finding-your-ideal-running-form

Research question: Can peoplebecome better, more efficientrunners on their own, merely byrunning?Population of interest: All people

Sample: Group of adult women who recently joined a running groupPopulation to which results can be generalized: Adult women, if thedata are randomly sampled

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 10 / 60

Page 33: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Populations and samples

http:// well.blogs.nytimes.com/ 2012/ 08/ 29/

finding-your-ideal-running-form

Research question: Can peoplebecome better, more efficientrunners on their own, merely byrunning?Population of interest: All people

Sample: Group of adult women who recently joined a running group

Population to which results can be generalized: Adult women, if thedata are randomly sampled

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 10 / 60

Page 34: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Populations and samples

http:// well.blogs.nytimes.com/ 2012/ 08/ 29/

finding-your-ideal-running-form

Research question: Can peoplebecome better, more efficientrunners on their own, merely byrunning?Population of interest: All people

Sample: Group of adult women who recently joined a running groupPopulation to which results can be generalized:

Adult women, if thedata are randomly sampled

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 10 / 60

Page 35: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Populations and samples

http:// well.blogs.nytimes.com/ 2012/ 08/ 29/

finding-your-ideal-running-form

Research question: Can peoplebecome better, more efficientrunners on their own, merely byrunning?Population of interest: All people

Sample: Group of adult women who recently joined a running groupPopulation to which results can be generalized: Adult women, if thedata are randomly sampled

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 10 / 60

Page 36: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Data Collection (cont.)

Be aware that there exist “bad” samples.“There are three kinds of lies: lies, damned lies, andstatistics.”

If poor sampling techniques are utilized, then the observedstatistics will not be applicable to the true population of interest.

Example Data Collection:

Raise your hand if you have been on an airplane in the past twoyears.What does this tell us about how many 17-23 year olds haveridden an airplane in the past two years?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 11 / 60

Page 37: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Data Collection (cont.)

Be aware that there exist “bad” samples.“There are three kinds of lies: lies, damned lies, andstatistics.”

If poor sampling techniques are utilized, then the observedstatistics will not be applicable to the true population of interest.

Example Data Collection:

Raise your hand if you have been on an airplane in the past twoyears.

What does this tell us about how many 17-23 year olds haveridden an airplane in the past two years?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 11 / 60

Page 38: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Scientific Inquiry

Data Collection (cont.)

Be aware that there exist “bad” samples.“There are three kinds of lies: lies, damned lies, andstatistics.”

If poor sampling techniques are utilized, then the observedstatistics will not be applicable to the true population of interest.

Example Data Collection:

Raise your hand if you have been on an airplane in the past twoyears.What does this tell us about how many 17-23 year olds haveridden an airplane in the past two years?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 11 / 60

Page 39: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling from a population

Census

Wouldn’t it be better to just include everyone and “sample” the entirepopulation, i.e. conduct a census?

Some individuals are hard to locate or hard to measure. Andthese difficult-to-find people may have certain characteristics thatdistinguish them from the rest of the population.Populations rarely stand still. Even if you could take a census,the population changes constantly, so it’s never possible to get aperfect measure.

http:// www.npr.org/ templates/ story/ story.php?storyId=125380052

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 12 / 60

Page 40: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling from a population

Census

Wouldn’t it be better to just include everyone and “sample” the entirepopulation, i.e. conduct a census?

Some individuals are hard to locate or hard to measure. Andthese difficult-to-find people may have certain characteristics thatdistinguish them from the rest of the population.Populations rarely stand still. Even if you could take a census,the population changes constantly, so it’s never possible to get aperfect measure.

http:// www.npr.org/ templates/ story/ story.php?storyId=125380052

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 12 / 60

Page 41: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling from a population

Census

Wouldn’t it be better to just include everyone and “sample” the entirepopulation, i.e. conduct a census?

Some individuals are hard to locate or hard to measure. Andthese difficult-to-find people may have certain characteristics thatdistinguish them from the rest of the population.Populations rarely stand still. Even if you could take a census,the population changes constantly, so it’s never possible to get aperfect measure.

http:// www.npr.org/ templates/ story/ story.php?storyId=125380052Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 12 / 60

Page 42: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling from a population

Exploratory analysis to inference

Sampling is natural.

Think about sampling something you are cooking - you taste(examine) a small part of what you’re cooking to get an ideaabout the dish as a whole.When you taste a spoonful of soup and decide the spoonful youtasted isn’t salty enough, that’s exploratory analysis.If you generalize and conclude that your entire soup needs salt,that’s an inference.For your inference to be valid, the spoonful you tasted (thesample) needs to be representative of the entire pot (thepopulation).

If your spoonful comes only from the surface and the salt iscollected at the bottom of the pot, what you tasted is probably notrepresentative of the whole pot.If you first stir the soup thoroughly before you taste, your spoonfulwill more likely be representative of the whole pot.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 13 / 60

Page 43: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling from a population

Exploratory analysis to inference

Sampling is natural.Think about sampling something you are cooking - you taste(examine) a small part of what you’re cooking to get an ideaabout the dish as a whole.

When you taste a spoonful of soup and decide the spoonful youtasted isn’t salty enough, that’s exploratory analysis.If you generalize and conclude that your entire soup needs salt,that’s an inference.For your inference to be valid, the spoonful you tasted (thesample) needs to be representative of the entire pot (thepopulation).

If your spoonful comes only from the surface and the salt iscollected at the bottom of the pot, what you tasted is probably notrepresentative of the whole pot.If you first stir the soup thoroughly before you taste, your spoonfulwill more likely be representative of the whole pot.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 13 / 60

Page 44: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling from a population

Exploratory analysis to inference

Sampling is natural.Think about sampling something you are cooking - you taste(examine) a small part of what you’re cooking to get an ideaabout the dish as a whole.When you taste a spoonful of soup and decide the spoonful youtasted isn’t salty enough, that’s exploratory analysis.

If you generalize and conclude that your entire soup needs salt,that’s an inference.For your inference to be valid, the spoonful you tasted (thesample) needs to be representative of the entire pot (thepopulation).

If your spoonful comes only from the surface and the salt iscollected at the bottom of the pot, what you tasted is probably notrepresentative of the whole pot.If you first stir the soup thoroughly before you taste, your spoonfulwill more likely be representative of the whole pot.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 13 / 60

Page 45: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling from a population

Exploratory analysis to inference

Sampling is natural.Think about sampling something you are cooking - you taste(examine) a small part of what you’re cooking to get an ideaabout the dish as a whole.When you taste a spoonful of soup and decide the spoonful youtasted isn’t salty enough, that’s exploratory analysis.If you generalize and conclude that your entire soup needs salt,that’s an inference.

For your inference to be valid, the spoonful you tasted (thesample) needs to be representative of the entire pot (thepopulation).

If your spoonful comes only from the surface and the salt iscollected at the bottom of the pot, what you tasted is probably notrepresentative of the whole pot.If you first stir the soup thoroughly before you taste, your spoonfulwill more likely be representative of the whole pot.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 13 / 60

Page 46: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling from a population

Exploratory analysis to inference

Sampling is natural.Think about sampling something you are cooking - you taste(examine) a small part of what you’re cooking to get an ideaabout the dish as a whole.When you taste a spoonful of soup and decide the spoonful youtasted isn’t salty enough, that’s exploratory analysis.If you generalize and conclude that your entire soup needs salt,that’s an inference.For your inference to be valid, the spoonful you tasted (thesample) needs to be representative of the entire pot (thepopulation).

If your spoonful comes only from the surface and the salt iscollected at the bottom of the pot, what you tasted is probably notrepresentative of the whole pot.If you first stir the soup thoroughly before you taste, your spoonfulwill more likely be representative of the whole pot.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 13 / 60

Page 47: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling bias

A few sources of bias

Non-response: If only a (non-random) fraction of the randomlysampled people choose to respond to a survey, the sample mayno longer be representative of the population.

Voluntary response: Occurs when the sample consists of peoplewho volunteer to respond because they have strong opinions onthe issue, and hence is not representative of the population.

edition.com, Aug 29, 2013

Convenience sample: Individuals who are easily accessible aremore likely to be included in the sample.

What type of bias do reviews on Amazon.com have? What about re-views on RateMyProfessor.com?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 14 / 60

Page 48: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling bias

A few sources of bias

Non-response: If only a (non-random) fraction of the randomlysampled people choose to respond to a survey, the sample mayno longer be representative of the population.Voluntary response: Occurs when the sample consists of peoplewho volunteer to respond because they have strong opinions onthe issue, and hence is not representative of the population.

edition.com, Aug 29, 2013

Convenience sample: Individuals who are easily accessible aremore likely to be included in the sample.

What type of bias do reviews on Amazon.com have? What about re-views on RateMyProfessor.com?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 14 / 60

Page 49: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling bias

A few sources of bias

Non-response: If only a (non-random) fraction of the randomlysampled people choose to respond to a survey, the sample mayno longer be representative of the population.Voluntary response: Occurs when the sample consists of peoplewho volunteer to respond because they have strong opinions onthe issue, and hence is not representative of the population.

edition.com, Aug 29, 2013

Convenience sample: Individuals who are easily accessible aremore likely to be included in the sample.

What type of bias do reviews on Amazon.com have? What about re-views on RateMyProfessor.com?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 14 / 60

Page 50: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling bias

A few sources of bias

Non-response: If only a (non-random) fraction of the randomlysampled people choose to respond to a survey, the sample mayno longer be representative of the population.Voluntary response: Occurs when the sample consists of peoplewho volunteer to respond because they have strong opinions onthe issue, and hence is not representative of the population.

edition.com, Aug 29, 2013

Convenience sample: Individuals who are easily accessible aremore likely to be included in the sample.

What type of bias do reviews on Amazon.com have? What about re-views on RateMyProfessor.com?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 14 / 60

Page 51: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling bias

A few sources of bias

Non-response: If only a (non-random) fraction of the randomlysampled people choose to respond to a survey, the sample mayno longer be representative of the population.Voluntary response: Occurs when the sample consists of peoplewho volunteer to respond because they have strong opinions onthe issue, and hence is not representative of the population.

edition.com, Aug 29, 2013

Convenience sample: Individuals who are easily accessible aremore likely to be included in the sample.

What type of bias do reviews on Amazon.com have? What about re-views on RateMyProfessor.com?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 14 / 60

Page 52: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling bias

Landon vs. FDR

A historical example of a biased sample yielding misleading results:

In 1936, Landonsought theRepublicanpresidentialnomination opposingthe re-election ofFDR.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 15 / 60

Page 53: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling bias

The Literary Digest Poll

The Literary Digest polled about 10 millionAmericans, and got responses from about2.4 million.

The poll showed that Landon would likelybe the overwhelming winner and FDRwould get only 43% of the votes.

Election result: FDR won, with 62% of thevotes.

The magazine was completely discredited because of the poll,and was soon discontinued.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 16 / 60

Page 54: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling bias

The Literary Digest Poll - what went wrong?

The magazine had surveyed

its own readers,registered automobile owners, andregistered telephone users.

These groups had incomes well above the national average ofthe day (remember, this is Great Depression era) which resultedin lists of voters far more likely to support Republicans than atruly typical voter of the time, i.e. the sample was notrepresentative of the American population at the time.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 17 / 60

Page 55: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling bias

Large samples are preferable, but...

The Literary Digest election poll was based on a sample size of2.4 million, which is huge, but since the sample was biased, thesample did not yield an accurate prediction.

Back to the soup analogy: If the soup is not well stirred, it doesn’tmatter how large a spoon you have, it will still not taste right. Ifthe soup is well stirred, a small spoon will suffice to test the soup.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 18 / 60

Page 56: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling bias

Participation question

A school district is considering whether it will no longer allow high schoolstudents to park at school after two recent accidents where students wereseverely injured. As a first step, they survey parents by mail, asking themwhether or not the parents would object to this policy change. Of 6,000 sur-veys that go out, 1,200 are returned. Of these 1,200 surveys that were com-pleted, 960 agreed with the policy change and 240 disagreed. Which of thefollowing statements are true?

I. Some of the mailings may have never reached the parents.

II. The school district has strong support from parents to move forwardwith the policy approval.

III. It is possible that majority of the parents of high school studentsdisagree with the policy change.

IV. The survey results are unlikely to be biased because all parents weremailed a survey.

(a) Only I (b) I and II (c) I and III (d) III and IV (e) Only IV

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 19 / 60

Page 57: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Sampling bias

Participation question

A school district is considering whether it will no longer allow high schoolstudents to park at school after two recent accidents where students wereseverely injured. As a first step, they survey parents by mail, asking themwhether or not the parents would object to this policy change. Of 6,000 sur-veys that go out, 1,200 are returned. Of these 1,200 surveys that were com-pleted, 960 agreed with the policy change and 240 disagreed. Which of thefollowing statements are true?

I. Some of the mailings may have never reached the parents.

II. The school district has strong support from parents to move forwardwith the policy approval.

III. It is possible that majority of the parents of high school studentsdisagree with the policy change.

IV. The survey results are unlikely to be biased because all parents weremailed a survey.

(a) Only I (b) I and II (c) I and III (d) III and IV (e) Only IV

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 19 / 60

Page 58: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Observational studies and experiments

Observational studies and experiments

An experimental study is a controlled study in which theresearchers impose treatments upon the subjects.

Subjects are assigned to control and treatment groups usingrandom assignment.Experiments are the preferred method of data collection becauseoften results can be attributed as causal. I.e., we can concludethat the treatments caused the response of the study.In some cases experiments are not always feasible or ethical.

An observational study is a study in which the researchers didnot assign the subjects to treatments.

Observational studies retain the notion of treatment and controlgroups.Observational studies still require the researcher to clearly definea research question. This requires identification of the responsevariable that they will measure on each subject in the study.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 20 / 60

Page 59: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Observational studies and experiments

Observational studies and experiments

An experimental study is a controlled study in which theresearchers impose treatments upon the subjects.

Subjects are assigned to control and treatment groups usingrandom assignment.Experiments are the preferred method of data collection becauseoften results can be attributed as causal. I.e., we can concludethat the treatments caused the response of the study.In some cases experiments are not always feasible or ethical.

An observational study is a study in which the researchers didnot assign the subjects to treatments.

Observational studies retain the notion of treatment and controlgroups.Observational studies still require the researcher to clearly definea research question. This requires identification of the responsevariable that they will measure on each subject in the study.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 20 / 60

Page 60: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Observational studies and experiments

Experimental vs Observational Datasets (cont.)

Example: We want to consider the effect of drinking alcoholduring pregnancy on rates of Fetal Alcohol Syndrome.

Research question (population/parameter)?Should we use experimental or observational data?What potential biases should we be cautious of?

Response BiasNon-response BiasUndercoverage Bias

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 21 / 60

Page 61: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Overview of data collection principles Observational studies and experiments

Experimental vs Observational Datasets (cont.)

Example: We want to consider the effect of drinking alcoholduring pregnancy on rates of Fetal Alcohol Syndrome.

Research question (population/parameter)?Should we use experimental or observational data?What potential biases should we be cautious of?

Response BiasNon-response BiasUndercoverage Bias

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 21 / 60

Page 62: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data

1 Introduction to DataObservations and variablesTypes of variables

2 Overview of data collection principlesScientific InquiryPopulations and SamplesSampling from a populationSampling biasObservational studies and experiments

3 Observational DataCereal breakfastSampling methods

4 ExperimentsPrinciples of experimental design

5 Recap6 Syllabus & policies

LogisticsGoals and topicsDetailsSupportPoliciesTips

7 To do

Sta 101

U1 - L1: Data coll., obs. studies, experiments N.Dalzell– Duke University

Page 63: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data

Observational studies and experiments (Recap)

Observational study: Researchers collect data in a way that doesnot directly interfere with how the data arise, i.e. they merely“observe”, and can only establish an association between theexplanatory and response variables.

Experiment: Researchers randomly assign subjects to varioustreatments in order to establish causal connections between theexplanatory and response variables.If you’re going to walk away with one thing from this class, let itbe “correlation does not imply causation”.

http:// xkcd.com/ 552/

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 22 / 60

Page 64: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data

Observational studies and experiments (Recap)

Observational study: Researchers collect data in a way that doesnot directly interfere with how the data arise, i.e. they merely“observe”, and can only establish an association between theexplanatory and response variables.Experiment: Researchers randomly assign subjects to varioustreatments in order to establish causal connections between theexplanatory and response variables.

If you’re going to walk away with one thing from this class, let itbe “correlation does not imply causation”.

http:// xkcd.com/ 552/

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 22 / 60

Page 65: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data

Observational studies and experiments (Recap)

Observational study: Researchers collect data in a way that doesnot directly interfere with how the data arise, i.e. they merely“observe”, and can only establish an association between theexplanatory and response variables.Experiment: Researchers randomly assign subjects to varioustreatments in order to establish causal connections between theexplanatory and response variables.If you’re going to walk away with one thing from this class, let itbe “correlation does not imply causation”.

http:// xkcd.com/ 552/

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 22 / 60

Page 66: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Cereal breakfast

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 23 / 60

Page 67: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Cereal breakfast

What type of study is this, observational study or an experiment?“Girls who regularly ate breakfast, particularly one that includes cereal, were slimmer

than those who skipped the morning meal, according to a study that tracked nearly

2,400 girls for 10 years. [...] As part of the survey, the girls were asked once a year

what they had eaten during the previous three days.”

This is an observational study since the researchers merely observedthe behavior of the girls (subjects) as opposed to imposing treatmentson them.

What is the conclusion of the study?

There is an association between girls eating breakfast and beingslimmer.

Who sponsored the study?

General Mills.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 24 / 60

Page 68: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Cereal breakfast

What type of study is this, observational study or an experiment?“Girls who regularly ate breakfast, particularly one that includes cereal, were slimmer

than those who skipped the morning meal, according to a study that tracked nearly

2,400 girls for 10 years. [...] As part of the survey, the girls were asked once a year

what they had eaten during the previous three days.”

This is an observational study since the researchers merely observedthe behavior of the girls (subjects) as opposed to imposing treatmentson them.What is the conclusion of the study?

There is an association between girls eating breakfast and beingslimmer.

Who sponsored the study?

General Mills.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 24 / 60

Page 69: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Cereal breakfast

What type of study is this, observational study or an experiment?“Girls who regularly ate breakfast, particularly one that includes cereal, were slimmer

than those who skipped the morning meal, according to a study that tracked nearly

2,400 girls for 10 years. [...] As part of the survey, the girls were asked once a year

what they had eaten during the previous three days.”

This is an observational study since the researchers merely observedthe behavior of the girls (subjects) as opposed to imposing treatmentson them.What is the conclusion of the study?

There is an association between girls eating breakfast and beingslimmer.Who sponsored the study?

General Mills.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 24 / 60

Page 70: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Cereal breakfast

What type of study is this, observational study or an experiment?“Girls who regularly ate breakfast, particularly one that includes cereal, were slimmer

than those who skipped the morning meal, according to a study that tracked nearly

2,400 girls for 10 years. [...] As part of the survey, the girls were asked once a year

what they had eaten during the previous three days.”

This is an observational study since the researchers merely observedthe behavior of the girls (subjects) as opposed to imposing treatmentson them.What is the conclusion of the study?

There is an association between girls eating breakfast and beingslimmer.Who sponsored the study?

General Mills.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 24 / 60

Page 71: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Cereal breakfast

3 possible explanations:

1 Eating breakfast causes girls to be thinner.

2 Being thin causes girls to eat breakfast.

3 A third variable is responsible for both. What could it be?An extraneous variable that affects both the explanatory and theresponse variable and that make it seem like there is arelationship between the two are called confounding variables.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 25 / 60

Page 72: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Cereal breakfast

3 possible explanations:

1 Eating breakfast causes girls to be thinner.

2 Being thin causes girls to eat breakfast.

3 A third variable is responsible for both. What could it be?An extraneous variable that affects both the explanatory and theresponse variable and that make it seem like there is arelationship between the two are called confounding variables.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 25 / 60

Page 73: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Cereal breakfast

3 possible explanations:

1 Eating breakfast causes girls to be thinner.

2 Being thin causes girls to eat breakfast.

3 A third variable is responsible for both. What could it be?An extraneous variable that affects both the explanatory and theresponse variable and that make it seem like there is arelationship between the two are called confounding variables.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 25 / 60

Page 74: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Cereal breakfast

3 possible explanations:

1 Eating breakfast causes girls to be thinner.

2 Being thin causes girls to eat breakfast.

3 A third variable is responsible for both. What could it be?An extraneous variable that affects both the explanatory and theresponse variable and that make it seem like there is arelationship between the two are called confounding variables.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 25 / 60

Page 75: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Cereal breakfast

Project ideas - observational studies

1 numerical: Is the average number of hours Americans spendrelaxing after work different than the European average of 3hours/day?[Data: Number of hours relaxing after work]

1 categorical: Estimate the percentage of North Carolinaresidents who live below the poverty line and are planning tovote Republican in the most recent presidential election.[Data: Vote Republican - yes, no]

1 numerical and 1 categorical: Is there a relationship betweenmom’s working status during the first 5 years of the childı¿½s lifeand the child’s education?[Data: Number of years of education of child; Mom’s working status - yes, no]

2 categorical: Do racial minority groups in North Carolina haveless access to health care coverage?[Data: Ethnicity - white, minority; Health coverage - yes, no]

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 26 / 60

Page 76: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Cereal breakfast

Project ideas - observational studies

1 numerical: Is the average number of hours Americans spendrelaxing after work different than the European average of 3hours/day?[Data: Number of hours relaxing after work]

1 categorical: Estimate the percentage of North Carolinaresidents who live below the poverty line and are planning tovote Republican in the most recent presidential election.[Data: Vote Republican - yes, no]

1 numerical and 1 categorical: Is there a relationship betweenmom’s working status during the first 5 years of the childı¿½s lifeand the child’s education?[Data: Number of years of education of child; Mom’s working status - yes, no]

2 categorical: Do racial minority groups in North Carolina haveless access to health care coverage?[Data: Ethnicity - white, minority; Health coverage - yes, no]

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 26 / 60

Page 77: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Cereal breakfast

Project ideas - observational studies

1 numerical: Is the average number of hours Americans spendrelaxing after work different than the European average of 3hours/day?[Data: Number of hours relaxing after work]

1 categorical: Estimate the percentage of North Carolinaresidents who live below the poverty line and are planning tovote Republican in the most recent presidential election.[Data: Vote Republican - yes, no]

1 numerical and 1 categorical: Is there a relationship betweenmom’s working status during the first 5 years of the childı¿½s lifeand the child’s education?[Data: Number of years of education of child; Mom’s working status - yes, no]

2 categorical: Do racial minority groups in North Carolina haveless access to health care coverage?[Data: Ethnicity - white, minority; Health coverage - yes, no]

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 26 / 60

Page 78: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Cereal breakfast

Project ideas - observational studies

1 numerical: Is the average number of hours Americans spendrelaxing after work different than the European average of 3hours/day?[Data: Number of hours relaxing after work]

1 categorical: Estimate the percentage of North Carolinaresidents who live below the poverty line and are planning tovote Republican in the most recent presidential election.[Data: Vote Republican - yes, no]

1 numerical and 1 categorical: Is there a relationship betweenmom’s working status during the first 5 years of the childı¿½s lifeand the child’s education?[Data: Number of years of education of child; Mom’s working status - yes, no]

2 categorical: Do racial minority groups in North Carolina haveless access to health care coverage?[Data: Ethnicity - white, minority; Health coverage - yes, no]

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 26 / 60

Page 79: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Sampling methods

Obtaining good samples

Almost all statistical methods are based on the notion of impliedrandomness.

If observational data are not collected in a random frameworkfrom a population, these statistical methods – the estimates anderrors associated with the estimates – are not reliable.

Most commonly used random sampling techniques are simple,stratified, and cluster sampling.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 27 / 60

Page 80: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Sampling methods

Simple random sample

Randomly select cases from the population, each case is equallylikely to be selected.

Index

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

Index

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

Stratum 1

Stratum 2

Stratum 3

Stratum 4

Stratum 5

Stratum 6

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

Cluster 7

Cluster 8

Cluster 9

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 28 / 60

Page 81: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Sampling methods

Stratified sample

Strata are homogenous, simple random sample from each stratum.

Index

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

Index

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

Stratum 1

Stratum 2

Stratum 3

Stratum 4

Stratum 5

Stratum 6

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

Cluster 7

Cluster 8

Cluster 9

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 29 / 60

Page 82: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Sampling methods

Cluster sample

Clusters are not necessarily homogenous, simple random samplefrom a random sample of clusters. Usually preferred for economicalreasons.

Index

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

Index

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

Stratum 1

Stratum 2

Stratum 3

Stratum 4

Stratum 5

Stratum 6

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Cluster 6

Cluster 7

Cluster 8

Cluster 9

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 30 / 60

Page 83: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Sampling methods

Participation question

A city council has requested a household survey be conducted in asuburban area of their city. The area is broken into many distinct andunique neighborhoods, some including large homes, some with onlyapartments. Which approach would likely be the least effective?

(a) Simple random sampling

(b) Cluster sampling

(c) Stratified sampling

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 31 / 60

Page 84: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Observational Data Sampling methods

Participation question

A city council has requested a household survey be conducted in asuburban area of their city. The area is broken into many distinct andunique neighborhoods, some including large homes, some with onlyapartments. Which approach would likely be the least effective?

(a) Simple random sampling

(b) Cluster sampling

(c) Stratified sampling

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 31 / 60

Page 85: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Experiments

1 Introduction to DataObservations and variablesTypes of variables

2 Overview of data collection principlesScientific InquiryPopulations and SamplesSampling from a populationSampling biasObservational studies and experiments

3 Observational DataCereal breakfastSampling methods

4 ExperimentsPrinciples of experimental design

5 Recap6 Syllabus & policies

LogisticsGoals and topicsDetailsSupportPoliciesTips

7 To do

Sta 101

U1 - L1: Data coll., obs. studies, experiments N.Dalzell– Duke University

Page 86: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Experiments Principles of experimental design

Principles of experimental design

1 Control: Compare treatment of interest to a control group.2 Randomize: Randomly assign subjects to treatments.3 Replicate: Within a study, replicate by collecting a sufficiently

large sample. Or replicate the entire study.4 Block: If there are variables that are known or suspected to affect

the response variable, first group subjects into blocks based onthese variables, and then randomize cases within each block totreatment groups.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 32 / 60

Page 87: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Experiments Principles of experimental design

More on blocking

We would like to design an experiment toinvestigate if energy gels makes you run faster:

Treatment: energy gelControl: no energy gel

It is suspected that energy gels might affect proand amateur athletes differently, therefore weblock for pro status:

Divide the sample to pro and amateurRandomly assign pro athletes to treatment andcontrol groupsRandomly assign amateur athletes totreatment and control groupsPro/amateur status is equally represented inthe resulting treatment and control groups

Why is this important? Can you think of other variables to block for?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 33 / 60

Page 88: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Experiments Principles of experimental design

More on blocking

We would like to design an experiment toinvestigate if energy gels makes you run faster:

Treatment: energy gelControl: no energy gel

It is suspected that energy gels might affect proand amateur athletes differently, therefore weblock for pro status:

Divide the sample to pro and amateurRandomly assign pro athletes to treatment andcontrol groupsRandomly assign amateur athletes totreatment and control groupsPro/amateur status is equally represented inthe resulting treatment and control groups

Why is this important? Can you think of other variables to block for?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 33 / 60

Page 89: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Experiments Principles of experimental design

More on blocking

We would like to design an experiment toinvestigate if energy gels makes you run faster:

Treatment: energy gelControl: no energy gel

It is suspected that energy gels might affect proand amateur athletes differently, therefore weblock for pro status:

Divide the sample to pro and amateurRandomly assign pro athletes to treatment andcontrol groupsRandomly assign amateur athletes totreatment and control groupsPro/amateur status is equally represented inthe resulting treatment and control groups

Why is this important? Can you think of other variables to block for?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 33 / 60

Page 90: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Experiments Principles of experimental design

More on blocking

We would like to design an experiment toinvestigate if energy gels makes you run faster:

Treatment: energy gelControl: no energy gel

It is suspected that energy gels might affect proand amateur athletes differently, therefore weblock for pro status:

Divide the sample to pro and amateurRandomly assign pro athletes to treatment andcontrol groupsRandomly assign amateur athletes totreatment and control groupsPro/amateur status is equally represented inthe resulting treatment and control groups

Why is this important? Can you think of other variables to block for?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 33 / 60

Page 91: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Experiments Principles of experimental design

More on blocking

We would like to design an experiment toinvestigate if energy gels makes you run faster:

Treatment: energy gelControl: no energy gel

It is suspected that energy gels might affect proand amateur athletes differently, therefore weblock for pro status:

Divide the sample to pro and amateurRandomly assign pro athletes to treatment andcontrol groupsRandomly assign amateur athletes totreatment and control groupsPro/amateur status is equally represented inthe resulting treatment and control groups

Why is this important? Can you think of other variables to block for?

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 33 / 60

Page 92: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Experiments Principles of experimental design

Participation question

A study is designed to test the effect of light level and noise level onexam performance of students. The researcher also believes that lightand noise levels might have different effects on males and females,so wants to make sure both genders are represented equally underdifferent conditions. Which of the below is correct?

(a) There are 3 explanatory variables (light, noise, gender) and 1response variable (exam performance)

(b) There are 2 explanatory variables (light and noise), 1 blockingvariable (gender), and 1 response variable (exam performance)

(c) There is 1 explanatory variable (gender) and 3 response variables(light, noise, exam performance)

(d) There are 2 blocking variables (light and noise), 1 explanatoryvariable (gender), and 1 response variable (exam performance)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 34 / 60

Page 93: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Experiments Principles of experimental design

Participation question

A study is designed to test the effect of light level and noise level onexam performance of students. The researcher also believes that lightand noise levels might have different effects on males and females,so wants to make sure both genders are represented equally underdifferent conditions. Which of the below is correct?

(a) There are 3 explanatory variables (light, noise, gender) and 1response variable (exam performance)

(b) There are 2 explanatory variables (light and noise), 1 blockingvariable (gender), and 1 response variable (exam performance)

(c) There is 1 explanatory variable (gender) and 3 response variables(light, noise, exam performance)

(d) There are 2 blocking variables (light and noise), 1 explanatoryvariable (gender), and 1 response variable (exam performance)

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 34 / 60

Page 94: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Experiments Principles of experimental design

Difference between blocking and explanatory variables

Factors are conditions we can impose on the experimental units.

Blocking variables are characteristics that the experimental unitscome with, that we would like to control for.

Blocking is like stratifying, except used in experimental settingswhen randomly assigning, as opposed to when sampling.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 35 / 60

Page 95: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Experiments Principles of experimental design

More experimental design terminology...

Placebo: fake treatment, often used as the control group formedical studies

Placebo effect: experimental units showing improvement simplybecause they believe they are receiving a special treatment

Blinding: when experimental units do not know whether they arein the control or treatment group

Double-blind: when both the experimental units and theresearchers do not know who is in the control and who is in thetreatment group

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 36 / 60

Page 96: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Experiments Principles of experimental design

Project ideas - experiments

1 numerical and 1 categorical: Is there a relationship betweenmemory and distraction? Randomly assign 20 students to twogroups: one group memorizes a list of words while also listeningto music, another group memorizes the same words in silence.Compare average number of words memorized in the twogroups.[Data: Number of words memorized; Group - treatment, control]

2 categorical: Is there a relationship between learning anddistraction? Randomly assign a group of students to two groups:one group studies a concept while also listening to music, theother group studies in silence using the same materials. Thentest whether or not they learned the concept.[Data: Whether or not the students learned the concept - yes, no; Group -

treatment, control

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 37 / 60

Page 97: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Experiments Principles of experimental design

Project ideas - experiments

1 numerical and 1 categorical: Is there a relationship betweenmemory and distraction? Randomly assign 20 students to twogroups: one group memorizes a list of words while also listeningto music, another group memorizes the same words in silence.Compare average number of words memorized in the twogroups.[Data: Number of words memorized; Group - treatment, control]

2 categorical: Is there a relationship between learning anddistraction? Randomly assign a group of students to two groups:one group studies a concept while also listening to music, theother group studies in silence using the same materials. Thentest whether or not they learned the concept.[Data: Whether or not the students learned the concept - yes, no; Group -

treatment, control

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 37 / 60

Page 98: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Recap

1 Introduction to DataObservations and variablesTypes of variables

2 Overview of data collection principlesScientific InquiryPopulations and SamplesSampling from a populationSampling biasObservational studies and experiments

3 Observational DataCereal breakfastSampling methods

4 ExperimentsPrinciples of experimental design

5 Recap6 Syllabus & policies

LogisticsGoals and topicsDetailsSupportPoliciesTips

7 To do

Sta 101

U1 - L1: Data coll., obs. studies, experiments N.Dalzell– Duke University

Page 99: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Recap

Participation question

What is the main difference between observational studies and exper-iments?

(a) Experiments take place in a lab while observational studies donot need to.

(b) In an observational study we only look at what happened in thepast.

(c) Most experiments use random assignment while observationalstudies do not.

(d) Observational studies are completely useless since no causalinference can be made based on their findings.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 38 / 60

Page 100: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Recap

Participation question

What is the main difference between observational studies and exper-iments?

(a) Experiments take place in a lab while observational studies donot need to.

(b) In an observational study we only look at what happened in thepast.

(c) Most experiments use random assignment while observationalstudies do not.

(d) Observational studies are completely useless since no causalinference can be made based on their findings.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 38 / 60

Page 101: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Recap

Step 3: Data Analysis

Appropriate analysis techniques will depend on the researchquestion of interest. For example, different techniques arerequired for predicting stock prices vs. estimating the averageheight of Duke students.

Goal:At the end of this class you should be able to identify appropriateanalysis techniques for a standard set of data types.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 39 / 60

Page 102: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Recap

Step 4: Forming Conclusions

If we can’t make a conclusion or apply results, then what goodwas our study?

Communication is key. We need to help non-statisticiansunderstand the results of our analyses in order to effectively aidin decision making and behavioral change.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 40 / 60

Page 103: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Recap

Random assignment vs. random sampling

Random assignment

No random assignment

Random sampling

Causal conclusion, generalized to the whole

population.

No causal conclusion, correlation statement

generalized to the whole population.

Generalizability

No random sampling

Causal conclusion, only for the sample.

No causal conclusion, correlation statement only

for the sample.No

generalizability

Causation Correlation

ideal experiment

most experiments

most observational

studies

bad observational

studies

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 41 / 60

Page 104: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies

1 Introduction to DataObservations and variablesTypes of variables

2 Overview of data collection principlesScientific InquiryPopulations and SamplesSampling from a populationSampling biasObservational studies and experiments

3 Observational DataCereal breakfastSampling methods

4 ExperimentsPrinciples of experimental design

5 Recap6 Syllabus & policies

LogisticsGoals and topicsDetailsSupportPoliciesTips

7 To do

Sta 101

U1 - L1: Data coll., obs. studies, experiments N.Dalzell– Duke University

Page 105: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Logistics

General Info

Instructor: Nicole Dalzell - [email protected] Chemistry 214

Lecture: MTuWThF 11:00 AM - 12:15 PMSocial Science 119

Lab: TuTh 1:30 PM - 3:00 PMSocial Sciences 229

Officehours:

Tentative: MW 2:00 PM - 3:00 PM or by appointment

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 41 / 60

Page 106: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Logistics

Required materials

Textbook OpenIntro StatisticsDiez, Barr, Cetinkaya-RundelCreateSpace, 2nd Edition, 2012ISBN: 978-1478217206

Calculator (Optional) You might need a four function calcu-lator that can do square roots for this class. Nolimitation on the type of calculator you can use.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 42 / 60

Page 107: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Logistics

Webpage

http:// stat.duke.edu/ courses/ Summer14/ sta101.001-2/

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 43 / 60

Page 108: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Goals and topics

Inference

Design of studies

Probability

Bayesian inference

Frequentist inference(CLT & simulation)

Modeling (numerical response)

1 explanatory

numerical

categorical

one mean & median

one proportion

many explanatory

Exploratory data

analysistwo means & mediansmany means

two proportionsmany proportions

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 44 / 60

Page 109: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Details

Course structure

Seven learning units.

Set of learning objectives and required and suggested readings,videos, etc. for each unit.

Prior to beginning the unit, complete the readings and familiarizeyourselves with the learning objectives.

Begin a new unit with a readiness assessment: individual, thenteam.

Class time: split between lecture, discussion/application.

Computing labs.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 45 / 60

Page 110: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Details

Teams

3-5 students based on data from the survey and the pre-test

Heterogeneous with respect to stats exposure and homogenouswith respect to majors and/or interests - to the extent that it’spossible

Constant teams throughout semester

Peer evaluations

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 46 / 60

Page 111: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Details

Readiness assessments (Quizzes) - beginning of unit

Objective: Encourage you to complete the reading assignment priorto coming to class and evaluate your conceptual understanding of thelearning objectives.

10 multiple choice questions, at the beginning of a unit.

Conceptual questions addressing the learning objectives of thenew unit, assessing familiarity and reasoning, not mastery.

Take the individual readiness assessment, and then re-take thesame assessment in teams.

Your performance on both assessments factors into your finalgrade: score for each assessment is a weighted average of theindividual (2/3) and team (1/3) scores.

Lowest score will be dropped.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 47 / 60

Page 112: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Details

Class - duration of unit

Slides will be posted on the course webpage (under schedule)on the day of the course.

Discussion of concepts as well as hands on activities andexercises to complement them (sit with your team).

Attend class to keep up with the pace and not fall behind + tocontribute to application activities completed in teams.

You are responsible for all the material covered in all componentsof the course, not just the class. Please ask questions in class,office-hours or by e-mail if you are struggling (or just curious), donot wait until just before an exam when it may be too late.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 48 / 60

Page 113: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Details

Participation questions: attendance and participation

Objective: Make you an active participant and help me pace the class.

On new material being discussed in class that day.

Credit for participation, regardless of whether you have thecorrect answer.

Up to two unexcused late arrivals or absences will not affect yourparticipation grade.

While I might sometimes call on you during the class discussion,it is your responsibility to be an active participant without beingcalled on.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 49 / 60

Page 114: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Details

Problem sets and labs

Problem sets:Objective: Help you develop a more in-depth understanding ofthe material and help you prepare for exams and projects.

Individual: collaborate but don’t copy! – submit in class, show allwork.

Labs:Objective: Give you hands on experience with data analysisusing a statistical software and provide you with tools for theprojects.

If you haven’t yet done so, send me your gmail address as soonas possible to create an RStudio account.In teams – turn in lab report on Sakai by the following day at 5 PM.

Lowest score dropped for both.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 50 / 60

Page 115: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Details

Problem sets and labs

Problem sets:Objective: Help you develop a more in-depth understanding ofthe material and help you prepare for exams and projects.

Individual: collaborate but don’t copy! – submit in class, show allwork.

Labs:Objective: Give you hands on experience with data analysisusing a statistical software and provide you with tools for theprojects.

If you haven’t yet done so, send me your gmail address as soonas possible to create an RStudio account.In teams – turn in lab report on Sakai by the following day at 5 PM.

Lowest score dropped for both.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 50 / 60

Page 116: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Details

Problem sets and labs

Problem sets:Objective: Help you develop a more in-depth understanding ofthe material and help you prepare for exams and projects.

Individual: collaborate but don’t copy! – submit in class, show allwork.

Labs:Objective: Give you hands on experience with data analysisusing a statistical software and provide you with tools for theprojects.

If you haven’t yet done so, send me your gmail address as soonas possible to create an RStudio account.In teams – turn in lab report on Sakai by the following day at 5 PM.

Lowest score dropped for both.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 50 / 60

Page 117: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Details

Project

Objective: Give you independent applied research experience usingreal data and statistical methods.

individual

statistical inference exploring the distributional characteristics ofone variable or relationship between two variables

choose a research question, find data, analyze it, write up yourresults

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 51 / 60

Page 118: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Details

Exams

Midterm: Wednesday, July 16th, in class

Final: Saturday, August 9th (9:00 AM - 12:00 PM) (Cumulative)

Exam dates cannot be changed. No make-up exams will begiven. If you cannot take the exams on these dates you shoulddrop this class.

You must bring a calculator to the exams (no cell phones, iPods,etc.) and you are also allowed to bring one sheet of notes(“cheat sheet”). This sheet must be no larger than 8 1

2 ” × 11” andmust be prepared by you (no photocopies). You may use bothsides of the sheet.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 52 / 60

Page 119: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Details

Grading

In Class Participation/Activites: 5%Quizzes: 5%Problem sets: 15%Labs: 10%

Project: 20%

Midterm: 20%

Final: 25%

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 53 / 60

Page 120: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Support

Email

I will regularly send announcements by email, so make sure tocheck your email daily.

While email is the quickest way to reach me outside of class, it ismuch more efficient to answer most statistical questions inperson.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 54 / 60

Page 121: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Support

Discussion Forum on SakaiContent related questions should be posted on the DiscussionForum on Sakai.

Title your questions according to the guidelines on the forum.

Check if your question has already been answered beforeposting a new question.

I will be answering questions on the forum daily and all studentsare expected to answer questions as well.

“Watch” the forums to be notified when a new question is posted.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 55 / 60

Page 122: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Support

Office hours

Instructor Mondays and Wednesdays 2:00 - 3:00 PM

You are highly encouraged to stop by with any questions orcomments about the class, or just to say hi and introduceyourself.

Most problem sets due on Tuesday and Thursday. Recommendattempting all problems two days before to make the most of OH(and lab sessions).

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 56 / 60

Page 123: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Policies

Policies

Late work policy for problem sets and labs reports:

late but submitted duringclass: lose 10% of pointsafter class on due date: lose20% of points

next day: lose 40% of points

later than next day: lose allpoints

Late work policy for project: 10% off for each day (24-hourperiod) late.

No make-ups

Regrade requests: within one week, no regrade for number ofpoints deducted for a mistake, no regrade after the final

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 57 / 60

Page 124: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Policies

Policies

Late work policy for problem sets and labs reports:

late but submitted duringclass: lose 10% of pointsafter class on due date: lose20% of points

next day: lose 40% of points

later than next day: lose allpoints

Late work policy for project: 10% off for each day (24-hourperiod) late.

No make-ups

Regrade requests: within one week, no regrade for number ofpoints deducted for a mistake, no regrade after the final

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 57 / 60

Page 125: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Policies

Policies

Late work policy for problem sets and labs reports:

late but submitted duringclass: lose 10% of pointsafter class on due date: lose20% of points

next day: lose 40% of points

later than next day: lose allpoints

Late work policy for project: 10% off for each day (24-hourperiod) late.

No make-ups

Regrade requests: within one week, no regrade for number ofpoints deducted for a mistake, no regrade after the final

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 57 / 60

Page 126: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Policies

Policies

Late work policy for problem sets and labs reports:

late but submitted duringclass: lose 10% of pointsafter class on due date: lose20% of points

next day: lose 40% of points

later than next day: lose allpoints

Late work policy for project: 10% off for each day (24-hourperiod) late.

No make-ups

Regrade requests: within one week, no regrade for number ofpoints deducted for a mistake, no regrade after the final

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 57 / 60

Page 127: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Policies

Academic Dishonesty

Any form of academic dishonesty will result in an immediate 0 on thegiven assignment and will be reported to the Office of StudentConduct. Additional penalties may also be assessed if deemedappropriate. If you have any questions about whether something is oris not allowed, ask me beforehand.

Some examples:Use of disallowed materials (including any form ofcommunication with classmates or accessing the web) duringexams and readiness assessments.Plagiarism of any kind.Use of outside answer keys or solution manuals for thehomework.

If you have any questions about whether something is or is notallowed, ask me beforehand.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 58 / 60

Page 128: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Tips

Tips for success

1 Complete the reading before a new unit begins, and then reviewagain after the unit is over.

2 Be an active participant during lectures and labs.3 Ask questions - during class or office hours, or by email. Ask me

and your classmates.4 Do the problem sets - start early and make sure you attempt and

understand all questions.5 Start your project early and and allow adequate time to complete

it.6 Give yourself plenty of time to prepare a good cheat sheet for

exams. This requires going through the material and taking thetime to review the concepts that you’re not comfortable with.

7 Do not procrastinate - don’t let a unit go by with unansweredquestions as it will just make the following unit’s material evenmore difficult to follow.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 59 / 60

Page 129: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Tips

Tips for success

1 Complete the reading before a new unit begins, and then reviewagain after the unit is over.

2 Be an active participant during lectures and labs.

3 Ask questions - during class or office hours, or by email. Ask meand your classmates.

4 Do the problem sets - start early and make sure you attempt andunderstand all questions.

5 Start your project early and and allow adequate time to completeit.

6 Give yourself plenty of time to prepare a good cheat sheet forexams. This requires going through the material and taking thetime to review the concepts that you’re not comfortable with.

7 Do not procrastinate - don’t let a unit go by with unansweredquestions as it will just make the following unit’s material evenmore difficult to follow.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 59 / 60

Page 130: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Tips

Tips for success

1 Complete the reading before a new unit begins, and then reviewagain after the unit is over.

2 Be an active participant during lectures and labs.3 Ask questions - during class or office hours, or by email. Ask me

and your classmates.

4 Do the problem sets - start early and make sure you attempt andunderstand all questions.

5 Start your project early and and allow adequate time to completeit.

6 Give yourself plenty of time to prepare a good cheat sheet forexams. This requires going through the material and taking thetime to review the concepts that you’re not comfortable with.

7 Do not procrastinate - don’t let a unit go by with unansweredquestions as it will just make the following unit’s material evenmore difficult to follow.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 59 / 60

Page 131: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Tips

Tips for success

1 Complete the reading before a new unit begins, and then reviewagain after the unit is over.

2 Be an active participant during lectures and labs.3 Ask questions - during class or office hours, or by email. Ask me

and your classmates.4 Do the problem sets - start early and make sure you attempt and

understand all questions.

5 Start your project early and and allow adequate time to completeit.

6 Give yourself plenty of time to prepare a good cheat sheet forexams. This requires going through the material and taking thetime to review the concepts that you’re not comfortable with.

7 Do not procrastinate - don’t let a unit go by with unansweredquestions as it will just make the following unit’s material evenmore difficult to follow.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 59 / 60

Page 132: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Tips

Tips for success

1 Complete the reading before a new unit begins, and then reviewagain after the unit is over.

2 Be an active participant during lectures and labs.3 Ask questions - during class or office hours, or by email. Ask me

and your classmates.4 Do the problem sets - start early and make sure you attempt and

understand all questions.5 Start your project early and and allow adequate time to complete

it.

6 Give yourself plenty of time to prepare a good cheat sheet forexams. This requires going through the material and taking thetime to review the concepts that you’re not comfortable with.

7 Do not procrastinate - don’t let a unit go by with unansweredquestions as it will just make the following unit’s material evenmore difficult to follow.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 59 / 60

Page 133: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Tips

Tips for success

1 Complete the reading before a new unit begins, and then reviewagain after the unit is over.

2 Be an active participant during lectures and labs.3 Ask questions - during class or office hours, or by email. Ask me

and your classmates.4 Do the problem sets - start early and make sure you attempt and

understand all questions.5 Start your project early and and allow adequate time to complete

it.6 Give yourself plenty of time to prepare a good cheat sheet for

exams. This requires going through the material and taking thetime to review the concepts that you’re not comfortable with.

7 Do not procrastinate - don’t let a unit go by with unansweredquestions as it will just make the following unit’s material evenmore difficult to follow.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 59 / 60

Page 134: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

Syllabus & policies Tips

Tips for success

1 Complete the reading before a new unit begins, and then reviewagain after the unit is over.

2 Be an active participant during lectures and labs.3 Ask questions - during class or office hours, or by email. Ask me

and your classmates.4 Do the problem sets - start early and make sure you attempt and

understand all questions.5 Start your project early and and allow adequate time to complete

it.6 Give yourself plenty of time to prepare a good cheat sheet for

exams. This requires going through the material and taking thetime to review the concepts that you’re not comfortable with.

7 Do not procrastinate - don’t let a unit go by with unansweredquestions as it will just make the following unit’s material evenmore difficult to follow.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 59 / 60

Page 135: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

To do

1 Introduction to DataObservations and variablesTypes of variables

2 Overview of data collection principlesScientific InquiryPopulations and SamplesSampling from a populationSampling biasObservational studies and experiments

3 Observational DataCereal breakfastSampling methods

4 ExperimentsPrinciples of experimental design

5 Recap6 Syllabus & policies

LogisticsGoals and topicsDetailsSupportPoliciesTips

7 To do

Sta 101

U1 - L1: Data coll., obs. studies, experiments N.Dalzell– Duke University

Page 136: Unit 1: Introduction to data Lecture 1: Data collection ...1) Unit 1/… · Introduction to Data Statistics and Data Statistics is the art and science of making inferences from data.

To do

To do

1 Download or purchase the textbook.www.openintro.org

2 Read the syllabus and let me know if you have any questions.3 Start reviewing the resources for Unit 1 – .

http:// stat.duke.edu/ courses/ Summer14/ sta101.001-2/resources.html

4 Complete Lab 0 - this is just an introduction to RStudio.

Sta 101 (N.Dalzell– Duke University) U1 - L1: Data coll., obs. studies, experiments June 1, 2014 60 / 60


Recommended