Graphically Representing Data

1. English 317: Technical Writing Graphically Representing Data by Professor Karen Thompson The following slides were prepared from the first chapter of Edward Tuftes, The Visual Display of Quantitative Information. 1

2. History of graphically representing quantitative data. Visually representing numbers is a new invention. It was not until 1750-1800 that statistical graphics were invented. William Playfair (1759-1823) developed almost all of the fundamental graphical designs, replacing the conventional tables of numbers with visual representations. 2

3. This is Playfairs chart comparing the price of wheat/bread to wages paid tosmiths, masons, and carpenters between 1565 to 1821.3

4. Playfairs interpretation of his data: never before at any period was wheat so cheap in proportion to mechanical labor The graphic depicted wages paid to mechanical labor; he did not collect data on wages paid to non-mechanical labor those working in factories under terrible conditions. Graphics both report data and reveal it. The graphics are only as effective as the data used to derive them.4

5. The statistician F.J. Anscombe demonstrated why it is important to graph data beforeanalyzing it. The quartet consists of four sets of data that have identical simplestatistical properties. They are, however, very different when graphed.Anscombes quartet shows how graphics do not just report data, they reveal data.Therefore, they are instruments for reasoning about quantitative information. 5

6. One of the earliest uses of a map to chart patterns (data map) was done by Dr. John Snow. He plotted the location of deaths from cholera in London for September in 1854. By analyzing the scatter of dots (which marked deaths), Snow observed that cholera occurred almost exclusively among those who lived near (and drank from) the Broad Street water pump (circled on this map). He ended the epidemic that had killed over 500 people by identifying the source of contamination.6

7. Example of a modern data map.7

8. Data maps help us identify leads to causes but have their flaws. The flaws inherent to data maps such as the cancer map is that they equate the visual importance of each county with its geographic area rather than with the number of people living in the county. Our visual impression of the data is entangled with the circumstances of geographic boundaries, shapes, and areas. An additional shortcoming of the cancer map is that it was created using death certificate reports on the cause of death. These reports fall under the influence of diagnostic fashion prevailing among doctors and coroners in particular places and times. Its not that data maps are not useful, it is just that you need to be careful when interpreting what they mean. 8

9. Relational Graphics Link at least two variables. Encourage the viewer to assess the possible causal relationship between the plotted variables. It displays causal theories that X causes Y with empirical evidence as to the exact relationship between X and Y. 9

10. Relational graphics did not to appear in scientific writings until the late 1700s. Here is a drawing by Johann Heinrich Lambert. It shows the periodic variation in soil temperature in relation to the depth under the surface.10

11. Relational graphics evolved to include multiple variables. The following slides shows the graphic, Charles Joseph Minard (1781-1870) shows the losses suffered in 1812 by Napoleons army in Russia. Six variables are plotted 1) size of the army, 2) location of the army), 3) its location on a two-dimensional surface, 4) the direction of the armys movement when advancing, 5) the direction of the armys movement in retreat, and 6) temperature on various dates during the retreat from Moscow. The take-away when looking at the following graphic is to realize that graphics can convey a high level of complexity if handled right. 11

12. Narrative graphic of Napoleons army invading Russia in 1812.12

13. Tables can present words as well as numbers. Date Time Power Level EventApril 25 1:00 a.m. 3200 MW Operators begin power descentApril 25 2:00 p.m. 1600 MW Power descent delayed for 9 hours Emergency core-cooling system disconnectedApril 25 11:10 p.m. 1600 MW Operators switch off automatic control Power descent resumedApril 26 1:00 a.m. 30 MW Power minimum reachedApril 26 1:19 a.m. 200 MW Operators pull rods beyond allowable limits Operators start two additional coolant pumps Operators violate coolant flow limitsApril 26 1:23 a.m. 2,000,000 MW Power surges by factor of 10,000 in 5 seconds Table 1. Sequence of events in the Chernobyl accident 13

14. Notice how the line graph more effectively conveys the story about glucoselevels in diabetics compared to those without diabetes. 300 Breakfast Lunch Dinner Table 2. [Carlson, 1982]. Blood glucose levels 250 Diabetic Time Normal Diabetic Blood 200 (hour) (mg/dl*) (mg/dl) Glucose Level midnight 100.3 175.8 150 (mg/dl) 2:00 93.6 165.7 4:00 88.2 159.4 100 6:00 100.5 72.1 8:00 138.6 271.0 Non Diabetic 10:00 102.4 224.6 50 noon 93.8 161.8 2:00 132.3 242.7 4:00 103.8 219.4 0 6:00 93.6 152.6 12:00 pm 6:00 am 12:00 6:00pm 12:00 am 8:00 127.8 227.1 10:00 109.2 221.3 Hour* decaliters/milligram Figure 11. Blood glucose levels for diabetic compared to individuals who do not have the disease. [Carlson, 1982]. 14

15. Pie Charts Show proportions of parts to a whole. Unless the proportions show significant difference, they can be difficult for audience to interpret. Notice in the next slide how the bar chart makes it easier to see differences in the data. Even if the pie segments contained percentages, the bar chart makes it easier to see the subtle differences in some of the data. 15

16. Example: Because pie charts do not represent subtle differences well, some experts say to avoid them.16

17. Use a pie chart. When the differences in the data are clear, and can be proportioned in 7 segments or less. 17

18. Line Graphs Most useful in displaying data that changes continuously over time. Example: 18

19. Scatter Plots Show how much one variable is affected by another. The relationship between two variables is called their correlation. Scatter plots usually consist of a large body of data. The closer the data points come when plotted to making a straight line, the higher the correlation between the two variables, or the stronger the relationship. 19

20. Example: Scatter Plot20

21. Graphical integrity A number of factors can lead to misleading visual aids. Some of these include: Scales that do not start at 0 Some 3-D configurations Some pictographs False comparisons "Doctored" photographs 21

22. The Lie Factor Graphic misrepresentation can be measured by this equation: Lie Factor = size of the effect shown in graphic size of effect in data If the lie factor is equal to one, then the graphic is doing a good job of accurately representing the numbers. If the lie factor is greater than 1.05 or less than .95, it indicates substantial distortion of the data. 22

23. Example of graphic with a high lie factor. 23

24. Lie factor in prior graphic is high The magnitude of change from 1978 to 1985 is shown in the graph by the relative lengths of the two lines which yields a high lie factor. It also displays strange aspects of perspective. Most roads to the future are shown as in front of us, this display reverses the convention. 24

25. Guidelines for Graphics Clearly label X and Y axis (start at zero whenever possible) Start at zero if possible. Use 3-Dimensional graphics only when it improves audiences ability to interpret data. If used, be consistent throughout the document. Refer to graphic in text of document. Interpret the significance of the data. Label graphic or visual (Figure 1, Table 2 etc.); Write a Caption. 25

Date post:	01-Nov-2014
Category:	Technology
Upload:	university-of-idaho
View:	5,596 times
Download:	0 times

Graphically Representing Data

Technology