STA 291Fall 2009
Lecture 1Dustin Lueker
Statistical terminology Descriptive methods Probability and distribution functions Estimation (confidence intervals) Hypothesis testing Inferential methods for two samples Simple linear regression and correlation
Topics
STA 291 Fall 2009 Lecture 1
Research in all fields is becoming more quantitative◦ Look at research journals◦ Most graduates will need to be familiar with basic
statistical methodology and terminology Newspapers, advertising, surveys, etc.
◦ Many statements contain statistical arguments Computers make complex statistical
methods easier to use
Why study Statistics?
STA 291 Fall 2009 Lecture 1
Many times statistics are used in an incorrect and misleading manner
Purposely misused◦ Companies/people wanting to furthur their
agenda Cooking the data
Completely making up data Massaging the numbers
Incidentally misused◦ Using inappropriate methods
Vital to understand a method before using it
Lies, Damn Lies, and Statistics
STA 291 Fall 2009 Lecture 1
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data
Applicable to a wide variety of academic disciplines◦ Physical sciences◦ Social sciences◦ Humanities
Statistics are used for making informed decisions◦ Business◦ Government
What is Statistics?
STA 291 Fall 2009 Lecture 1
Design •Planning research studies•How to best obtain the required data•Assuring that our data is representational of the entire population
Description •Summarizing data•Exploring patterns in the data•Extract/condense information
Inference •Make predictions based on the data•‘Infer’ from sample to population•Summarize results
General Statistical Methodology
STA 291 Fall 2009 Lecture 1
Population◦ Total set of all subjects of interest
Entire group of people, animals, products, etc. about which we want information
Elementary Unit◦ Any individual member of the population
Sample◦ Subset of the population from which the study
actually collects information◦ Used to draw conclusions about the whole
population
Basic Terminology
STA 291 Fall 2009 Lecture 1
Variable◦ A characteristic of a unit that can vary among
subjects in the population/sample Ex: gender, nationality, age, income, hair color, height,
disease status, state of residence, grade in STA 291 Parameter
◦ Numerical characteristic of the population Calculated using the whole population
Statistic◦ Numerical characteristic of the sample
Calculated using the sample
Basic Terminology
STA 291 Fall 2009 Lecture 1
Why take a sample? Why not take a census? Why not measure all of the units in the population?◦ Accuracy
May not be able to find every unit in the population◦ Time
Speed of response from units◦ Money◦ Infinite Population◦ Destructive Sampling or Testing
Data Collection and Sampling Theory
STA 291 Fall 2009 Lecture 1
University Health Services at UK conducts a survey about alcohol abuse among students◦ 200 of the students are sampled and asked to
complete a questionnaire◦ One question is “have you regretted something
you did while drinking?” What is the population? Sample?
Example
STA 291 Fall 2009 Lecture 1
‘Flavors’ of Statistics Descriptive Statistics
◦ Summarizing the information in a collection of data
Inferential Statistics◦ Using information from a sample to make
conclusions/predictions about the population
STA 291 Fall 2009 Lecture 1
Example The Current Population Survey of about 60,000
households in the United States in 2002 distinguishes three types of families: Married-couple (MC), Female householder and no husband (FH), Male householder and no wife (MH)
It indicated that 5.3% of “MC”, 26.5% of “FH”, and 12.1% of “MH” families have annual income below the poverty level◦ Are these numbers statistics or parameters?
The report says that the percentage of all “FH” families in the USA with income below the poverty level is at least 25.5% but no greater than 27.5%◦ Is this an example of descriptive or inferential statistics?
STA 291 Fall 2009 Lecture 1
Univariate vs. Multivariate Univariate data
◦ Consists of observations on a single attribute Multivariate data
◦ Consists of observations on several attributes Special case
Bivariate Data Consists of observations on two attributes
STA 291 Fall 2009 Lecture 1
Quantitative or Numerical◦ Variable with numerical values associated with
them Qualitative or Categorical
◦ Variables without numerical values associated with them
Scales of Measurement
STA 291 Fall 2009 Lecture 1
Nominal◦ Gender, nationality, hair color, state of residence
Nominal variables have a scale of unordered categories It does not make sense to say, for example, that green
hair is greater/higher/better than orange hair Ordinal
◦ Disease status, company rating, grade in STA 291 Ordinal variables have a scale of ordered categories,
they are often treated in a quantitative manner (A = 4.0, B = 3.0, etc.) One unit can have more of a certain property than does
another unit
Qualitative Variables
STA 291 Fall 2009 Lecture 1
Quantitative◦ Age, income, height
Quantitative variables are measured numerically, that is, for each subject a number is observed The scale for quantitative variables is called interval
scale
Quantitative Variables
STA 291 Fall 2009 Lecture 1
A study about oral hygiene and periodontal conditions among institutionalized elderly measured the following◦ Nominal (Qualitative): Requires assistance from staff?
Yes No
◦ Ordinal (Qualitative): Plaque score No visible plaque Small amounts of plaque Moderate amounts of plaque Abundant plaque
◦ Interval (Quantitative): Number of teeth
Example
STA 291 Fall 2009 Lecture 1
A birth registry database collects the following information on newborns◦ Birthweight: in grams◦ Infant’s Condition:
Excellent Good Fair Poor
◦ Number of prenatal visits◦ Ethnic background:
African-American Caucasian Hispanic Native American Other
What are the appropriate scales? Quantitative (Interval) Qualitative (Ordinal, Nominal)
Example
STA 291 Fall 2009 Lecture 1
Statistical methods vary for quantitative and qualitative variables
Methods for quantitative data cannot be used to analyze qualitative data
Quantitative variables can be treated in a less quantitative manner◦ Height: measured in cm/in
Interval (Quantitative) Can be treated at Qualitative
Ordinal: Short Average Tall
Nominal: <60in or >72in 60in-72in
Importance of Different Types of Data
STA 291 Fall 2009 Lecture 1
Try to measure variables as detailed as possible◦ Quantitative
More detailed data can be analyzed in further depth
◦ Caution: Sometimes ordinal variables are treated at quantitative (ex: GPA)
Other Notes on Variable Types
STA 291 Fall 2009 Lecture 1
A variable is discrete if it can take on a finite number of values◦ Gender◦ Nationality◦ Hair color◦ Disease status◦ Grade in STA 291◦ Favorite MLB team
Qualitative variables are discrete
Discrete Variables
STA 291 Fall 2009 Lecture 1
Continuous variables can take an infinite continuum of possible real number values◦ Time spent studying for STA 291 per day
43 minutes 2 minutes 27.487 minutes 27.48682 minutes
Can be subdivided into more accurate values Therefore continuous
Continuous Variables
STA 291 Fall 2009 Lecture 1
Number of children in a family Distance a car travels on a tank of gas % grade on an exam
Examples
STA 291 Fall 2009 Lecture 1
Quantitative variables can be discrete or continuous
Age, income, height?◦ Depends on the scale
Age is potentially continuous, but usually measured in years (discrete)
Discrete or Continuous
STA 291 Fall 2009 Lecture 1