Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | sarah-chase |
View: | 214 times |
Download: | 0 times |
Psyc 235:Introduction to
Statistics
Lecture Format• New Content/Conceptual Info• Questions & Work through problems
What you should have accomplished so far…
• ALEKS account set up• completed first assessment• Worked through first section of
material• Spent 5+ hours on ALEKS• Watched the video “What is
statistics?”
Any questions/problems so far?
From Last week:
• Definition of Statistics…
C Collecting …
O Organizing …
D Displaying …
I Interpreting …
A Analyzing …
Data
What is Data?
• Data is the generic term for numerical information that has been obtained on a set of objects/individuals etc.
• Variable: Some characteristic of the objects/individuals (e.g.,
height)• Data:
the values of a variable for a certain set of objects/individuals
Two branches of statistics:
Descriptive StatisticsDescribes a given set of data you have.
Inferential StatisticsGiven the data you have about these people,does this say anything about other people?
Today: Descriptive Statistics
• Graphical Presentations of Distributions Histograms Frequency Polygons Cumulative Distributions Box-and-whisker plots
• Descriptive Measures of Data Measures of Central Tendency Measures of Dispersion
Organizing Data
• Data from last week• Frequency Table
Time Awake Number of Students6:30-7:00 17:00-7:30 17:30-8:00 38:00-8:30 28:30-9:00 49:00-9:30 59:30-10:00 710:00-10:30 410:30-11:00 3
6:557
7:307:307:45
88:258:308:458:458:50
999
9:159:259:309:309:309:309:309:459:45
1010
10:1510:2510:3010:4510:50
Histograms
Note: Use Histogram to note patterns in data. (Skew, etc.)
0
1
2
3
4
5
6
7
8
6:30-7:00
7:00-7:30
7:30-8:00
8:00-8:30
8:30-9:00
9:00-9:30
9:30-10:00
10:00-10:30
10:30-11:00
Wake-Up Time
Number of Students
Frequency Polygon
Time Awake Number of Students
Frequency
6:30-7:00 1 0.0333
7:00-7:30 1 0.0333
7:30-8:00 3 0.1
8:00-8:30 2 0.0667
8:30-9:00 4 0.1333
9:00-9:30 5 0.1667
9:30-10:00 7 0.2333
10:00-10:30 4 0.1333
10:30-11:00 3 0.1
Total 30 1
0
0.05
0.1
0.15
0.2
0.25
6:30-7:00
7:00-7:30
7:30-8:00
8:00-8:30
8:30-9:00
9:00-9:30
9:30-10:00
10:00-10:30
10:30-11:00
Time Awake
Proportion of Students
Cumulative Frequency
0.0000
0.2000
0.4000
0.6000
0.8000
1.0000
1.2000
6:30-7:00
7:00-7:30
7:30-8:00
8:00-8:30
8:30-9:00
9:00-9:30
9:30-10:00
10:00-10:30
10:30-11:00
Time Awake
Time Awake Number of Students Frequency Cumulative6:30-7:00 1 0.03333333 0.03337:00-7:30 1 0.03333333 0.06677:30-8:00 3 0.1 0.16678:00-8:30 2 0.06666667 0.23338:30-9:00 4 0.13333333 0.36679:00-9:30 5 0.16666667 0.53339:30-10:00 7 0.23333333 0.766710:00-10:30 4 0.13333333 0.900010:30-11:00 3 0.1 1.0000Total 30 1
Box and Whisker Plots
• Graphical representation of the 4 quartiles, (e.g. data is split into 4 equally sized groups)
• If there are an even number of observations, let the “top” be the top half, and let the “bottom” be the bottom half.
• If there are an odd number of observations, let the “top” be everything above the median and the “bottom” be everything below the median.
• The first quartile is the “median of the bottom”. The third quartile is the “median of the top”.
Box-and-Whisker Example
6:557
7:307:307:45
88:258:308:458:458:50
999
9:159:259:309:309:309:309:309:459:45
1010
10:1510:2510:3010:4510:50
Median: 9:201st Quartile: 8:303rd Quartile: 9:45
Again, Note the information you can obtain by looking at this graphical representation of the data
Graphical Presentations of Data
• Listed Data: All data available
• Frequency Table:Data frequency for each cell is available
• Histograms: Data frequency for each bin is available
• Polygons: Data frequency for each bin is available
• Box-and-whisker plots:Summary info and data range available
• Often: Just summarize key features of the distribution.
Less And Less Information
Describing Distributions
Summary Measures
• Measures of Central Tendency “Average”, “Location”, “Center” of the distribution.
• Measures of Dispersion “Spread”, “Variability” of the distribution.
Summary Measures
• Measures of Central Tendency “Average”, “Location”, “Center” of the distribution.
• Measures of Dispersion “Spread”, “Variability” of the distribution.
Measures of Central Tendency
• Mean • Median• Mode
• May already be familiar with these concepts, but I want you to think of them in relation to describing data.
Mode
• Most frequent observation or observation class
• There can be several distinct modes• “Best guess” in single shot guessing
game 12
35
5
19ABCD
Mode (example data)
6:557
7:307:307:45
88:258:308:458:458:50
999
9:159:259:309:309:309:309:309:459:45
1010
10:1510:2510:3010:4510:50
Mode?
9:30
Median
• Any value M for which at least 50% of all observations are at or above M and at least 50% are at or below M.
• Resistant measure of central tendency (not heavily influenced by extreme values)
Calculating the Median
Order all observations from smallest to largest.
If the number of observations is odd, it is the “middle” object, namely the [(n+1)/2]th observation.For n = 61, it is the 31st
If the number of observations is even then, to get a unique value, take the average of the (n/2)th and the (n/2 +1)th observation. For = 60, it is the average of the 30th and the 31st observation.
Median (example data)
6:557
7:307:307:45
88:258:308:458:458:50
999
9:159:259:309:309:309:309:309:459:45
1010
10:1510:2510:3010:4510:50
Since there are an even number of data pointsTake the average of the middle two values.
,
Mean
• Sum up all observations (say, n many) and divide the total by n.
• Extreme values strongly influence the mean
• Mean as the center of the value in a distribution (center of gravity)
Calculating the mean
• Suppose that we collect n many observations
• Let denote the individual observations.
nXXXX ,...,,, 321
Mean • Sum up all observations (say, n many) and divide the total by n.
( )nn XXXnn
XXXX +++=
+++= ...
1...21
21Mean
Mathematical Notation
( )
∑∑ ==
+++=+++
=
ii
nn
Xnn
X
XXXnn
XXXX
1
...1...
2121Mean
n
n
ii XXXX +++=∑
=
...211
∑= iX
Mean (example data)
6:55 6.927 7
7:30 7.57:30 7.57:45 7.75
8 88:25 8.428:30 8.58:45 8.758:45 8.758:50 8.83
9 99 99 9
9:15 9.259:25 9.429:30 9.59:30 9.59:30 9.59:30 9.59:30 9.59:45 9.759:45 9.75
10 1010 10
10:15 10.2510:25 10.4210:30 10.510:45 10.7510:50 10.83
∑X = 273.34
X = 273.34 / 30 = 9.11
Transform back into time scale: ≈ 9:06
A few notes about summation, and implications for calculation of the mean
naaaa =+++ ...
n
naan
i
=∑=1
anaa n
n
in ==∑
=
1
1
1
0123456789
10
1 2 3 4 5
Mean
If all data has the same value, a, then the mean value is also a.
naan
i
=∑=1
because:
Multiplying all values by a constant
∑∑==
=n
ii
n
ii XaaX
11
( )nn XXXaaXaXaX +++=+++ ...... 2121
If we multiply each observationby 2, then we obtain a newdistribution with a different shape
A multiplying constant affects the mean
(and the “spread”)
XXXn
iin
n
iin 222
1
1
1
1 == ∑∑==
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10
Adding a constant to all values
naXXX
aXaXaX
n
n
++++=++++++
)...()(...)()(
21
21
naXaXn
ii
n
ii +⎟
⎠
⎞⎜⎝
⎛=+ ∑∑
== 11
)(
1 2 3 4 5 6 7 8 9 10
If we add the constant 5to each observation,then we obtain a newdistribution that is shiftedto the right by 5 units
A shift affects the mean(but not the “spread”)
55
)5(
1
1
1
1
1
+=+⎟⎠
⎞⎜⎝
⎛=
+
∑
∑
=
=
XnX
X
n
n
iin
n
iin
1 2 3 4 5 6 7 8 9 10
Combining two variables
)...()...(
)(...)()(
2121
2211
nn
nn
YYYXXX
YXYXYX
+++++++=++++++
⎟⎠
⎞⎜⎝
⎛+⎟
⎠
⎞⎜⎝
⎛=+ ∑∑∑
===
n
ii
n
ii
n
iii YXYX
111
)(
Adding two variables
⎟⎠
⎞⎜⎝
⎛+⎟
⎠
⎞⎜⎝
⎛=+ ∑∑∑
===
n
ii
n
ii
n
iii YXYX
111
)(
YXYXYXn
iin
n
iin
n
iiin +=⎟
⎠
⎞⎜⎝
⎛+⎟
⎠
⎞⎜⎝
⎛=+ ∑∑∑
=== 1
1
1
1
1
1 )(
The mean of the sum of two variables is the sum of their means
Measures of Dispersion
• Population Standard Deviation• Sample Standard Deviation
If we want to know how much the values vary around the
mean….
( ) ( ) ( )( )∑ −=
−++−+−
XX
XXXXXX
i
n...21
We could calculate how much each value varies from the mean…
Because of the way we calculate the mean, this formula gives zero no matter what data you have!
Population Standard Deviation
( ) ( ) ( )1
...22
2
2
12
−
−+−+−=
n
XXXXXXs n
( ) ( ) ( )1
...22
2
2
1
−
−+−+−=
n
XXXXXXs n
• Variance
• Standard Deviation
S
S
Sample Standard Deviation
• Variance
• Standard Deviation
( ) ( ) ( )1
...22
2
2
12
−
−+−+−=
n
XXXXXXs n
( ) ( ) ( )1
...22
2
2
1
−
−+−+−=
n
XXXXXXs n
There are n-1 “degrees of freedom”(If you know the mean and n-1 observationsthen you can figure out the n’th observation)
Computational Formulas
• Note that there are computational formulas for the standard deviation.
• Look for them in ALEKS and write them down.
• Remember you can bring notes to your assessments
For Next Week…
• Keep working on ALEKS• Finish the descriptive statistics section• Watch the second video• If you can, start probability section
before Jason’s lecture next week.
• Remember: Office Hours and Lab are always available for you.