Describing datawith graphicsand numbers
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Types of Data
•Categorical Variables –also known as class variables, nominal variables
•Quantitative Variables –aka numerical nariables
–either continuous or discrete.
Graphing categorical variables
Ten most common causes of death in Americans between 15 and 19 years old in 1999.
Bar graphs
Graphing numerical variables
Heights of BIOL 300 students (cm)
165 168 163 173 170 163 170 155 152 190 170 168 142 160 154 165 156 177 173 165 165 175
155 166 168 165 180 165
Stem-and-leaf plot
Stem-and-leaf plot
191817161514
000 0 0 3 3 5 70 3 3 5 5 5 5 5 5 6 8 8 82 4 5 5 6 2
Frequency table
Height Group
Frequency
141-150
151-160
161-170
171-180
181-190
Frequency table
Height Group
Frequency
141-150 1
151-160 6
161-170 15
171-180 5
181-190 1
Histogram
Histogram
HistogramFrequency distribution
Histogram with more data
150 160 170 180 190 200 210
0.2
0.4
0.6
0.8
1
Cumulative
Frequency
Height (in cm) of Bio300 Students
Cumulative Frequency Distribution
150 160 170 180 190 200 210
0.2
0.4
0.6
0.8
1
Cumulative
Frequency
Height (in cm) of Bio300 Students
Cumulative Frequency Distribution
90th percentile50th percentile(median)
Associations between two categorical variables
Association between reproductive effort and avian
malariaTable 2.3A. Contingency table showing incidence of
malaria in female great tits subjected to experimental
egg removal.
contro lgroup
egg removalgroup
rowtotal
malaria 7 15 22nomalaria
28 15 43
columntotal
35 30 65
Association between reproductive effort and avian
malariaTable 2.3A. Contingency table showing incidence of
malaria in female great tits subjected to experimental
egg removal.
contro lgroup
egg removalgroup
rowtotal
malaria 7 15 22nomalaria
28 15 43
columntotal
35 30 65
Mosaic plot
Control Egg removal
0.0
0.2
0.4
0.6
0.8
1.0
Treatment
Relative frequency
Figure 2.3B. Mosaic plot for reproductive effort and avian malariain great tits (Table 2.3A). Blue fill indicates diseased birds whereasthe white fill indicates birds free of malaria. n = 65 birds.
Grouped Bar Graph
Malaria No malaria Malaria No malaria
0
5
10
15
20
25
Control Egg removal
Associations between categorical and numerical
variables
Multiple histograms
0 200 400 600 800 1000
0
200
400
600
0
200
400
600
Non-conserved
0 200 400 600 800 1000
Protein length
Conserved
Associations between two numerical variables
Scatterplots
Scatterplots
Evaluating Graphics
• Lie factor
• Chartjunk
• EfficiencyQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
Don’t mislead with graphics
Better representation of truth
Lie Factor
• Lie factor = size of effect shown in graphic
size of effect in data
Lie Factor Example
Effect in graphic: 2.33/0.08= 29.1
Effect in data: 6748/5844= 1.15
Lie factor = 29.1 / 1.15= 25.3
ChartjunkChartjunk
0 50 100
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
NorthWestEast
Needless 3D Graphics
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Summary: Graphical methods for frequency distributions
Type of Data MethodCategorical data Bar graph
Numerical dataHistogram
Cumulative frequency distribution
Summary: Associations between variables
Explanatory variableResponse variable Categorical Numerical
CategoricalContingency tableGrouped bar graph
Mosaic plot
NumericalMultiple histograms
Cumulative frequency distributionsScatter plot
Great book on graphics
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Describing data
Two common descriptions of data
• Location (or central tendency)
• Width (or spread)
Measures of location
Mean
Median
Mode
Mean
€
Y =
Yi
i=1
n
∑n
n is the size of the sample
Mean
Y1=56, Y2=72, Y3=18, Y4=42
Mean
Y1=56, Y2=72, Y3=18, Y4=42
= (56+72+18+42) / 4 = 47
€
Y
Median
• The median is the middle measurement in a set of ordered data.
The data:
18 28 24 25 36 14 34
The data:
18 28 24 25 36 14 34
can be put in order:
14 18 24 25 28 34 36
Median is 25.
0.0
2.5
5.0
7.5
10.0
12.5
5 6 7 8 9 10 11 12 13 14 15 16 17 18
Frequency
Mouse weight at 50 days old, in
a line selected for small size
Mean
Mode
Median
Mean vs. median in politics
• 2004 U.S. Economy
• Republicans: times are good– Mean income increasing ~ 4% per year
• Democrats: times are bad– Median family income fell
• Why?
Mean 169.3 cm
Median 170 cm
Mode 165-170 cm
150 160 170 180 190 200 210
0.2
0.4
0.6
0.8
1
Cumulative
Frequency
Height (in cm) of Bio300 Students
Measures of width
• Range
• Standard deviation
• Variance
• Coefficient of variation
Range
14 17 18 20 22 22 24 25 26 28 28 28 30 34 36
Range
14 17 18 20 22 22 24 25 26 28 28 28 30 34 36
The range is 36-14 = 22
Population Variance
€
σ 2 =
Yi − μ( )2
i=1
N
∑N
Sample variance
€
s2 =
Yi −Y ( )2
i=1
n
∑n −1
n is the sample size
Shortcut for calculating sample variance
€
s2 =n
n −1
⎛
⎝ ⎜
⎞
⎠ ⎟
Yi2
i=1
n
∑n
−Y 2
⎛
⎝
⎜ ⎜ ⎜ ⎜
⎞
⎠
⎟ ⎟ ⎟ ⎟
Standard deviation (SD)
• Positive square root of the variance
σ is the true standard deviations is the sample standard deviation
In class exercise
Calculate the variance and standard deviation of a sample
with the following data:
6, 1, 2
Answer
Variance=7Standard deviation =
€
7
Coefficient of variance (CV)
CV = 100 s / .
€
Y
Equal means, different variances
-5 0 5 10
0.1
0.2
0.3
0.4
Value
Frequency
V = 1
V=2
V=10
Manipulating means
• The mean of the sum of two variables:
E[X + Y] = E[X]+ E[Y]
• The mean of the sum of a variable and a constant:
E[X + c] = E[X]+ c
• The mean of a product of a variable and a constant:
E[c X] = c E[X]
• The mean of a product of two variables:
E[X Y] = E[X] E[Y]
if and only if X and Y are independent.
Manipulating variance
• The variance of the sum of two variables:
Var[X + Y] = Var[X]+ Var[Y]
if and only if X and Y are independent.
• The variance of the sum of a variable and a constant:
Var[X + c] = Var[X]
• The variance of a product of a variable and a constant:
Var[c X] = c2 Var[X]
Parents’ heights
Mean Variance
Father Height
174.3 71.7
Mother Height
160.4 58.3
Father Height +Mother Height
334.7 184.9