1 English 317: Technical Writing Graphically Representing Data by Professor Karen Thompson The following slides were prepared from the first chapter of Edward Tufte’s, The Visual Display of Quantitative Information.
Transcript
1. English 317: Technical Writing Graphically Representing Data
by Professor Karen Thompson The following slides were prepared from
the first chapter of Edward Tuftes, The Visual Display of
Quantitative Information. 1
2. History of graphically representing quantitative data.
Visually representing numbers is a new invention. It was not until
1750-1800 that statistical graphics were invented. William Playfair
(1759-1823) developed almost all of the fundamental graphical
designs, replacing the conventional tables of numbers with visual
representations. 2
3. This is Playfairs chart comparing the price of wheat/bread
to wages paid tosmiths, masons, and carpenters between 1565 to
1821.3
4. Playfairs interpretation of his data: never before at any
period was wheat so cheap in proportion to mechanical labor The
graphic depicted wages paid to mechanical labor; he did not collect
data on wages paid to non-mechanical labor those working in
factories under terrible conditions. Graphics both report data and
reveal it. The graphics are only as effective as the data used to
derive them.4
5. The statistician F.J. Anscombe demonstrated why it is
important to graph data beforeanalyzing it. The quartet consists of
four sets of data that have identical simplestatistical properties.
They are, however, very different when graphed.Anscombes quartet
shows how graphics do not just report data, they reveal
data.Therefore, they are instruments for reasoning about
quantitative information. 5
6. One of the earliest uses of a map to chart patterns (data
map) was done by Dr. John Snow. He plotted the location of deaths
from cholera in London for September in 1854. By analyzing the
scatter of dots (which marked deaths), Snow observed that cholera
occurred almost exclusively among those who lived near (and drank
from) the Broad Street water pump (circled on this map). He ended
the epidemic that had killed over 500 people by identifying the
source of contamination.6
7. Example of a modern data map.7
8. Data maps help us identify leads to causes but have their
flaws. The flaws inherent to data maps such as the cancer map is
that they equate the visual importance of each county with its
geographic area rather than with the number of people living in the
county. Our visual impression of the data is entangled with the
circumstances of geographic boundaries, shapes, and areas. An
additional shortcoming of the cancer map is that it was created
using death certificate reports on the cause of death. These
reports fall under the influence of diagnostic fashion prevailing
among doctors and coroners in particular places and times. Its not
that data maps are not useful, it is just that you need to be
careful when interpreting what they mean. 8
9. Relational Graphics Link at least two variables. Encourage
the viewer to assess the possible causal relationship between the
plotted variables. It displays causal theories that X causes Y with
empirical evidence as to the exact relationship between X and Y.
9
10. Relational graphics did not to appear in scientific
writings until the late 1700s. Here is a drawing by Johann Heinrich
Lambert. It shows the periodic variation in soil temperature in
relation to the depth under the surface.10
11. Relational graphics evolved to include multiple variables.
The following slides shows the graphic, Charles Joseph Minard
(1781-1870) shows the losses suffered in 1812 by Napoleons army in
Russia. Six variables are plotted 1) size of the army, 2) location
of the army), 3) its location on a two-dimensional surface, 4) the
direction of the armys movement when advancing, 5) the direction of
the armys movement in retreat, and 6) temperature on various dates
during the retreat from Moscow. The take-away when looking at the
following graphic is to realize that graphics can convey a high
level of complexity if handled right. 11
12. Narrative graphic of Napoleons army invading Russia in
1812.12
13. Tables can present words as well as numbers. Date Time
Power Level EventApril 25 1:00 a.m. 3200 MW Operators begin power
descentApril 25 2:00 p.m. 1600 MW Power descent delayed for 9 hours
Emergency core-cooling system disconnectedApril 25 11:10 p.m. 1600
MW Operators switch off automatic control Power descent
resumedApril 26 1:00 a.m. 30 MW Power minimum reachedApril 26 1:19
a.m. 200 MW Operators pull rods beyond allowable limits Operators
start two additional coolant pumps Operators violate coolant flow
limitsApril 26 1:23 a.m. 2,000,000 MW Power surges by factor of
10,000 in 5 seconds Table 1. Sequence of events in the Chernobyl
accident 13
14. Notice how the line graph more effectively conveys the
story about glucoselevels in diabetics compared to those without
diabetes. 300 Breakfast Lunch Dinner Table 2. [Carlson, 1982].
Blood glucose levels 250 Diabetic Time Normal Diabetic Blood 200
(hour) (mg/dl*) (mg/dl) Glucose Level midnight 100.3 175.8 150
(mg/dl) 2:00 93.6 165.7 4:00 88.2 159.4 100 6:00 100.5 72.1 8:00
138.6 271.0 Non Diabetic 10:00 102.4 224.6 50 noon 93.8 161.8 2:00
132.3 242.7 4:00 103.8 219.4 0 6:00 93.6 152.6 12:00 pm 6:00 am
12:00 6:00pm 12:00 am 8:00 127.8 227.1 10:00 109.2 221.3 Hour*
decaliters/milligram Figure 11. Blood glucose levels for diabetic
compared to individuals who do not have the disease. [Carlson,
1982]. 14
15. Pie Charts Show proportions of parts to a whole. Unless the
proportions show significant difference, they can be difficult for
audience to interpret. Notice in the next slide how the bar chart
makes it easier to see differences in the data. Even if the pie
segments contained percentages, the bar chart makes it easier to
see the subtle differences in some of the data. 15
16. Example: Because pie charts do not represent subtle
differences well, some experts say to avoid them.16
17. Use a pie chart. When the differences in the data are
clear, and can be proportioned in 7 segments or less. 17
18. Line Graphs Most useful in displaying data that changes
continuously over time. Example: 18
19. Scatter Plots Show how much one variable is affected by
another. The relationship between two variables is called their
correlation. Scatter plots usually consist of a large body of data.
The closer the data points come when plotted to making a straight
line, the higher the correlation between the two variables, or the
stronger the relationship. 19
20. Example: Scatter Plot20
21. Graphical integrity A number of factors can lead to
misleading visual aids. Some of these include: Scales that do not
start at 0 Some 3-D configurations Some pictographs False
comparisons "Doctored" photographs 21
22. The Lie Factor Graphic misrepresentation can be measured by
this equation: Lie Factor = size of the effect shown in graphic
size of effect in data If the lie factor is equal to one, then the
graphic is doing a good job of accurately representing the numbers.
If the lie factor is greater than 1.05 or less than .95, it
indicates substantial distortion of the data. 22
23. Example of graphic with a high lie factor. 23
24. Lie factor in prior graphic is high The magnitude of change
from 1978 to 1985 is shown in the graph by the relative lengths of
the two lines which yields a high lie factor. It also displays
strange aspects of perspective. Most roads to the future are shown
as in front of us, this display reverses the convention. 24
25. Guidelines for Graphics Clearly label X and Y axis (start
at zero whenever possible) Start at zero if possible. Use
3-Dimensional graphics only when it improves audiences ability to
interpret data. If used, be consistent throughout the document.
Refer to graphic in text of document. Interpret the significance of
the data. Label graphic or visual (Figure 1, Table 2 etc.); Write a
Caption. 25