Date post: | 03-Jun-2018 |
Category: |
Documents |
Upload: | aritra-ghosh |
View: | 220 times |
Download: | 0 times |
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 1/20
Introduction to Statistics
Lecture Notes
Chapters 3-5
Please sign in (SIGNATURES) as you come in to class. It will savemy voice instead of my taking attendance (this is only to settle the
class roster).
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 2/20
What’s up with the powerpoint?
I don’t usually use slides, but am going to try to usethese to save my voice somewhat.
Notes: Still working on getting the class roster
settled. Has been some movement on the waitlist,will keep in touch as things develop. Be sure you’ve
signed in!
First homework is posted (on our course website),
but isn’t due until next Friday (the 4th). Theadditional problem is NOT optional, that just means it
is not a book problem.
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 3/20
Handouts for Today
There is one handout on graphs/descriptive statisticsgoing around. Save this to use tomorrow in class.
There is a second handout – the anonymous survey
largely designed by the class on Monday. Please goahead and take a few minutes to fill this out (no
names!) and get it back to me. We’ll take a look at
this data next week in lab.
If you missed class Monday, I have extra course
syllabuses at the front as well.
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 4/20
The “W”’s of a Data Set
Who – the observations (population – set of all objectsyou are interested in obtaining the value of some
parameter for – since we usually can’t observe all
objects, we take a sample of objects – a subset of the
overall population of objects to observe) Note: There is NO such thing as a population sample or
sample population.
What – the variables
Why – why was the data collected
How – how was the data collected (related to
design/sampling in chapters 12-13)
When/Where – more information that could be relevant
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 5/20
Chapters 3-5 Overview
Covers basic graphs and descriptive statistics forboth categorical and quantitative variables
This is what you would do as a “preliminary analysis”
for a variable.
Recall: a data set can have multiple variables in it.
These chapters focus on mostly univariate (single
variable) analyses. There is one comparative graph
– a side-by-side boxplot in Chapter 5.
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 6/20
3 Rules of Data Analysis
Rule 1- Make a picture Rule 2 – Make a picture (really, before you do
anything else)
Rule 3 – Make a picture (really, we mean a well-
chosen picture for your variables)
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 7/20
Categorical Variable Prelim Analysis
Frequency tables (one variable) – summarize countsby category
Contingency tables (2 or more variables) –
summarize counts by category for multiple variables
Bar charts Pie charts
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 8/20
Frequency
What is f requency ? Frequency is the number of objects/cases per category
You can also look at relat ive frequ ency .
Relative frequency is the number of objects/cases per
category divided by the total number of objects. Hence it gives proportions for each category out of the
total.
It is often converted to %.
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 9/20
Bar Charts
One bar per category – height is determined byfrequency or relative frequency
Order of categories is arbitrary.
Does NOT let you talk about the shape of a
distribution.
“Area” principle – areas are supposed to be relative.
This is often violated when people try to make
graphs “cool” and go 3-D, etc. (see Example passed
around).
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 10/20
Pie Charts
Take 100% of cases and divide up 360 degreesbased on relative frequencies.
We will look at bar charts over pie charts.
Note that for bar charts you do not need to create
bars for 100% of the cases. You could look at the top
three risk factors for a disease, etc. However, we
usually do have 100% of cases shown.
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 11/20
Contingency Tables - Example
See first page of Handout Totals for rows/columns give marginal dis t r ibut ion s
for each variable.
You can also look at condi t ional dis t r ibut ions . Fix
a row or column and work solely within that row orcolumn.
Concept of independence (will formalize later):
If the distribution of one variable is the same for all
categories of another variable, then the two variables are
independent.
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 12/20
On Your Own
Text has some discussion of segmented bar-chartsand side-by-side (feel free to read or skip)
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 13/20
Simpson’s Paradox
Something that can happen when you aggregatecategorical data
Looking at overall averages or % can be misleading
Can get different results looking at breakdown
Berkeley Discrimination Data Example (see bottom of
page one of the handout) Claims of Sexual Discrimination in1973 Graduate School
Admissions Overall, 44.28% of males who applied were admitted, while
only 34.58% of females were admitted. Look what happens when you breakdown by the 6 largest
departments though! (try this on your own or with a partner). Isthere evidence of discrimination against females at the dept.level? What is going on?
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 14/20
Quantitative Variables Preliminary Analysis
Graphs Dot plot – won’t use much – read about on your own
Stem and leaf – won’t use much – read about on your own
Histogram
Boxplot (chapter 5) Qqplot (Friday or next week)
Time plot (Friday or next week)
Descriptive statistics
Measures of center: mean, median Measures of spread: standard deviation, IQR, range
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 15/20
Describing the distribution of a quantitative
variable
You should focus on three things when describingthe distribution of a quantitative variable:
Shape – unimodal (one peak), bimodal (two peaks),
multimodal (many peaks), bell-shaped, skewed left (tail to
the left), skewed right (tail to the right), symmetric,
uniform (no peaks, basically flat)
Center – estimate the center (or use a descriptive
statistic)
If multiple peaks, report the peak locations
Spread – estimate the spread (can use a descriptivestatistic)
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 16/20
Dot Plot – On Your Own
Most basic quantitative graph Use for a low number of observations (<50)
Basically use a number line and place a dot above it
for each value you have observed.
Example from wikipedia:
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 17/20
Stem and Leaf – On Your Own
Your book discusses lots of options for these,including split leaves (which is something R/Rcmdr
will do).
Basics: You take your values and set a stem –
maybe tens. Then the leaves are the ones place. Foreach stem, you list the leaves that coincide in
numeric order.
Usually works decently for fewer than 100
observations Try it. Suppose you have scores on a pre-test for an
at-risk youth group as follows:
5, 11, 13, 21, 34, 36, 45, 47, 48, 48, 49
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 18/20
Histogram
Take the quantitative variable and break it up into “piles”or “bins” (usually the same width).
Count the number of observations in each bin or pile.
Plot the frequencies per bin.
Usually no spaces between bins (if there is, it is a gap –
NOT like a bar chart). You DO need to know the boundaries. (5,10], (10,15] as
bins IS different from [5,10),[10,15). (If anyone needs meto explain open/closed brackets, please ask).
Technology lets us vary the width of bins (effectively thenumber)
You can also use unequal bin widths but then you needsomething called densi ty , not frequency.
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 19/20
Examples
See page 2 of the handout Try to describe the shape of each histogram
Then see page 3 of the handout We’re going to create a histogram by hand if there is time
If no time, you can do this on your own.
8/12/2019 Introduction to Statistics - Chapter 3-5 Notes (2)
http://slidepdf.com/reader/full/introduction-to-statistics-chapter-3-5-notes-2 20/20
Cookie Lab
Time Permitting (otherwise, Friday)
The last page (to turn in) is not due till the end of
class tomorrow. So don’t worry if we don’t get to it
today. You can look at it tonight or tomorrow in class(I’ll give last five minutes of class for you to work on
it).