000
Introduction to Thinking About Data I:
The Importance of Being Earnest–Why Numbers Matter in Public Health
MMED
African Institute for the Mathematical Sciences
Muizenberg, South Africa
May, 2017
Brian G Williams, PhD
Stellenbosch University
Slide Set Citation: DOI: 10.6084/m9.figshare.5043136
The ICI3D Figshare Collection
000
Goals
Learn to
1. See patterns in data
2.Formulate hypothesis
3.Test theories
2The purpose of models is not to fit the data but to sharpen the questions. S. Karlin
000
The supreme goal of all theory is to make the
irreducible basic elements as simple and as
few as possible without having to surrender
the adequate representation of a single
datum of experience.
Albert Einstein:1933
On the Method of Theoretical Physics The Herbert Spencer Lecture, Oxford (10 June 1933)
000
Three rules of good modelling
• Stay as close to the data as you can
• Put in as much biology as you can
• Keep it simple
000
Semelweiss
000
6
Ignaz Semmelweiss: 1818–1865
Or why washing your hands matters
Junior doctor in
Vienna General Hospital
Puerperal fever and
maternal mortality
000
7
Ignaz Semmelweiss
About one in fifteen mothers were dying of puerperal fever during childbirth
0.00
0.05
0.10
0.15
1830 1835 1840 1845 1850 1855 1860
Mate
rnal m
ort
alit
yM
ate
rna
l m
ort
alit
y
0.00
0.05
0.10
0.15
1830 1835 1840 1845 1850 1855 1860
000
8
In 1840 maternal mortality in the red wards fell below that in the blue
wards. Medical students, who were doing autopsies before delivering
babies, were still working in the blue wards but had stopped delivering
in the red wards. In 1847 his colleague Jakob Kolletschka was cut with
a student's scalpel while performing a post-mortem and died with a
pathology similar to that of the mothers.
0.00
0.05
0.10
0.15
1830 1835 1840 1845 1850 1855 1860
Mate
rnal m
ort
alit
y
Only midwives
in red wards
Ignaz SemmelweissKolletschka dies
of septicaemia
Ma
tern
al m
ort
alit
y
0.00
0.05
0.10
0.15
1830 1835 1840 1845 1850 1855 1860
000
9
In 1848, aged 30 years, he made the medical students wash their hands in
chlorinated lime before they went into the maternity wards.
Mortality in the blue wards dropped to the same level as in the red wards.
0.00
0.05
0.10
0.15
1830 1835 1840 1845 1850 1855 1860
Mate
rnal m
ort
alit
y
Semmelweiss makes the
medical students wash
their hands
Ignaz Semmelweiss
Ma
tern
al m
ort
alit
y
0.00
0.05
0.10
0.15
1830 1835 1840 1845 1850 1855 1860
Problem Pattern in the data Think of an explanation
Do an intervention See if it works
000
10
In 1849 he was fired for criticizing his superiors. In 1865 he died of septicaemia in an insane asylum but it seems that the medical students must have gone on washing their hands
0.00
0.05
0.10
0.15
1830 1835 1840 1845 1850 1855 1860
Ma
tern
al m
ort
alit
y
Ignaz Semmelweiss
Semmelweiss fired
0.00
0.05
0.10
1830 1835 1840 1845 1850 1855 1860
0.15
1830 1835 1840 1845 1850 1855 1860
000
11
We can now work out odds-ratios and put confidence limits on the estimates.
Od
ds
rati
o f
or
mat
ern
al m
ort
alit
y
0
1
2
3
4
5
6
7
8
Ma
tern
al m
ort
alit
y (
blu
e/r
ed
)
Ignaz SemmelweissSemmelweiss makes the
medical students wash
their hands
Only midwives
in red wards
1830 1835 1840 1845 1850 1855 1860
000
Snow
000
Cholera and its mode of
transmission
000
William Farr: 1851
Miasma theory: Cholera was the result of
breathing polluted air
‘The amount of organic matter…[and] its
distribution will bear … resemblance to the
law regulating the mortality from cholera at
the various elevations’
000
Cholera mortality in London: 1849
( )a
m hh b
000
John Snow: 1854
Cholera is a water borne disease
000
Oxford Street
Regen
t Stre
et
Snow 1854 cases of cholera
Oxford Street
Regen
t Stre
et
Oxford Street
Regen
t Stre
et
Snow 1854 cases of cholera
Work
house
000
Oxford Street
Regen
t Stre
et
Snow 1854 cases of cholera
Oxford Street
Regen
t Stre
et
Oxford Street
Regen
t Stre
et
Snow 1854 cases of cholera
Work
house
Water pumps
000
Lancet 1858 Obituary columnDR JOHN SNOW—This well-known physician died at noon on the 16th instant, at his house in Sackville-street, from an attack of apoplexy. His researches on chloroform and other anaesthetics were appreciated by the profession.
000
In 1854 Filippo Paccini ‘Microscopical observations and pathological deductions on cholera’ in which he discovered a bacillus which he called Vibrio, and described the organism and its relation to cholera. Recognized in 1965
In 1884 Robert Koch became famous for his identification of the cholera bacillus [among other things] and is the acknowledged discoverer of the cholera organism.
Vibrio cholerae
000
Lancet 2014 Retraction
The Lancet wishes to correct, after an unduly
prolonged period of reflection, an impression that
it failed to recognise Dr Snow’s remarkable
achievements in the field of epidemiology and [in]
deducing the mode of transmission of cholera. …Comments in 1855 such as In riding his hobby
very hard, he has fallen down through a gully-
hole and has never since been able to get out
again and Has he any facts to show in proof? No!
were perhaps … overly negative in tone.
000
Historical data
000
000
Infectious diseases in England and Wales
1900 to 1990
000
000
000
000
000
000
000
Deaths at Baragwanath by age, sex and HIV-status 2006-2009
Men Positive Negative UndecidedWomen Positive Negative Undecided
000
Mendel
000
Mendel’s Peas
1822 to 1884
000
Question• Peas have tall or short stems but never in between.
• We can breed true; so that tall plants only produce
tall plants and short plants only produce short
plants
• What happens when we cross them to get the first
filial or F1 generation?
• What happens when we cross the F1 plants to get
the F2 generation?
000
Experimental design
Observation: Only two kinds of peas: Tall or short.
Experiment:
Tall Short
Tall Short
•••
•••
Tall Short
Tall Short
•••
•••
F1 Tall
F2 Tall:Short
000
Character Dominants Recessives Ratio
Round v. wrinkled seeds 5474 1850 2.96
Yellow v. green seeds 6022 2001 3.01
Purple v. white flowers 705 224 3.15
Smooth v. constricted pods 882 299 2.95
Axial v. terminal flowers 651 207 3.14
Green v. yellow unripe pods 428 152 2.82
Tall v. dwarf stems 787 277 2.84
Total 14949 5010 2.98
F2 Data
000
Theory!
• Gene has two alleles: T or t
• Breed TT and tt.
• TT x tt Tt
• F1: are tall. T is dominant
• Tt x Tt TT, Tt, tT or tt
• F2: 3 tall plants for each short plant.
000
Mendel did not understand that he had just discovered genetics.
Darwin must have read Mendel’s paper but even he did not understand that Mendel had just given him the mechanism underlying his theory of evolution.
Statistics helps us to define the question; the answer is always in the biology!
000
Advice to young epidemiologists
Never make a calculation until you know the answer.
Make an estimate before every calculation, try a
simple biological argument (R0, generation time,
selection, survival, control). Guess the answer to
every puzzle. Courage: no one else needs to know
what the guess is. Therefore, make it quickly, by
instinct. A right guess reinforces this instinct. A wrong
guess brings the refreshment of surprise. In either
case, life as an epidemiologist, however long, is more
fun.
Plagiarised from E.F. Taylor and J.A. Wheeler Space-time
Physics (1963)
000
Thank you for listening
000
Summary
1. Stay as close to the data as you can
2. Look for interesting patterns
3. Put in as much biology as you can
4. Keep it simple
5. Always remember that the purpose of models is not to fit the data but to sharpen the question
43
000
This presentation is made available through a Creative Commons Attribution license. Details of the license and permitted uses are available at
http://creativecommons.org/licenses/by/3.0/
© 2010 International Clinics on Infectious Disease Dynamics and DataWilliams BG. “Introduction to Thinking About Data” Clinic on the Meaningful Modeling of
Epidemiological Data. DOI:10.6084/m9.figshare.5043136.
For further information or modifiable slides please contact [email protected].
See the entire ICI3D Figshare Collection. DOI: 10.6084/m9.figshare.c.3788224.
000
000
000