Date post: | 02-Jan-2016 |
Category: |
Documents |
Upload: | casey-chase |
View: | 75 times |
Download: | 4 times |
Introduction to Introduction to BiostatisticsBiostatistics
Prof Haroon SaloojeeProf Haroon Saloojee
Division of Community PaediatricsDivision of Community Paediatrics
Introduction to BiostatisticsIntroduction to BiostatisticsLecture 1Lecture 1
Summarising your data 1Summarising your data 1
In God we trust In God we trust
All others must bring dataAll others must bring data
The evidence-based clinicianrsquos The evidence-based clinicianrsquos mottomotto
ChallengesChallenges
Statistical ideas can be difficult and Statistical ideas can be difficult and intimidatingintimidating
ThusThus Statistical results are often ldquoskipped-overrdquo Statistical results are often ldquoskipped-overrdquo
when reading scientific literaturewhen reading scientific literature Data is often misinterpretedData is often misinterpreted
Misinterpretation of DataMisinterpretation of Data
ldquoldquoCelebrating birthdays is healthyrdquoCelebrating birthdays is healthyrdquo
Statistics show that those that celebrate the most Statistics show that those that celebrate the most birthdays live the longestbirthdays live the longest
You may think thatYou may think that
A Bar Chart is a map of the locations of A Bar Chart is a map of the locations of the nearest tavernsthe nearest taverns
A p-value is the result of a urinalysisA p-value is the result of a urinalysis
A t-test is a taste test between rooibos tea A t-test is a taste test between rooibos tea and Five Roses teaand Five Roses tea
Course StructureCourse Structure
ldquoBIO-SADISTICSrdquo
Four 45-minute lectures
PowerPoint presentations on student web site
Some text (content) also on web page
Plus additional internet ldquolinksrdquo
Syllabus for the Course
1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions
SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals
SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power
SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods
Free textbook on-lineFree textbook on-line
Statistics at Square One
httpbmjbmjjournalscomcollectionsstatsbkindexshtml
httpwwwmedstatsaagcommcqsasp
Relevant topics
Handling data1 4 5 6 7
Sampling10 11
Hypothesis testing17 18
Todayrsquos LectureTodayrsquos Lecture
What types of data are thereWhat types of data are there
(numerical vs categorical variables) (numerical vs categorical variables)
Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)
Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)
Types of dataTypes of data
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Types of Data
Numerical dataDiscrete
Examples No of children No asthma attacks in a week No of rooms in home
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Introduction to BiostatisticsIntroduction to BiostatisticsLecture 1Lecture 1
Summarising your data 1Summarising your data 1
In God we trust In God we trust
All others must bring dataAll others must bring data
The evidence-based clinicianrsquos The evidence-based clinicianrsquos mottomotto
ChallengesChallenges
Statistical ideas can be difficult and Statistical ideas can be difficult and intimidatingintimidating
ThusThus Statistical results are often ldquoskipped-overrdquo Statistical results are often ldquoskipped-overrdquo
when reading scientific literaturewhen reading scientific literature Data is often misinterpretedData is often misinterpreted
Misinterpretation of DataMisinterpretation of Data
ldquoldquoCelebrating birthdays is healthyrdquoCelebrating birthdays is healthyrdquo
Statistics show that those that celebrate the most Statistics show that those that celebrate the most birthdays live the longestbirthdays live the longest
You may think thatYou may think that
A Bar Chart is a map of the locations of A Bar Chart is a map of the locations of the nearest tavernsthe nearest taverns
A p-value is the result of a urinalysisA p-value is the result of a urinalysis
A t-test is a taste test between rooibos tea A t-test is a taste test between rooibos tea and Five Roses teaand Five Roses tea
Course StructureCourse Structure
ldquoBIO-SADISTICSrdquo
Four 45-minute lectures
PowerPoint presentations on student web site
Some text (content) also on web page
Plus additional internet ldquolinksrdquo
Syllabus for the Course
1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions
SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals
SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power
SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods
Free textbook on-lineFree textbook on-line
Statistics at Square One
httpbmjbmjjournalscomcollectionsstatsbkindexshtml
httpwwwmedstatsaagcommcqsasp
Relevant topics
Handling data1 4 5 6 7
Sampling10 11
Hypothesis testing17 18
Todayrsquos LectureTodayrsquos Lecture
What types of data are thereWhat types of data are there
(numerical vs categorical variables) (numerical vs categorical variables)
Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)
Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)
Types of dataTypes of data
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Types of Data
Numerical dataDiscrete
Examples No of children No asthma attacks in a week No of rooms in home
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
In God we trust In God we trust
All others must bring dataAll others must bring data
The evidence-based clinicianrsquos The evidence-based clinicianrsquos mottomotto
ChallengesChallenges
Statistical ideas can be difficult and Statistical ideas can be difficult and intimidatingintimidating
ThusThus Statistical results are often ldquoskipped-overrdquo Statistical results are often ldquoskipped-overrdquo
when reading scientific literaturewhen reading scientific literature Data is often misinterpretedData is often misinterpreted
Misinterpretation of DataMisinterpretation of Data
ldquoldquoCelebrating birthdays is healthyrdquoCelebrating birthdays is healthyrdquo
Statistics show that those that celebrate the most Statistics show that those that celebrate the most birthdays live the longestbirthdays live the longest
You may think thatYou may think that
A Bar Chart is a map of the locations of A Bar Chart is a map of the locations of the nearest tavernsthe nearest taverns
A p-value is the result of a urinalysisA p-value is the result of a urinalysis
A t-test is a taste test between rooibos tea A t-test is a taste test between rooibos tea and Five Roses teaand Five Roses tea
Course StructureCourse Structure
ldquoBIO-SADISTICSrdquo
Four 45-minute lectures
PowerPoint presentations on student web site
Some text (content) also on web page
Plus additional internet ldquolinksrdquo
Syllabus for the Course
1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions
SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals
SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power
SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods
Free textbook on-lineFree textbook on-line
Statistics at Square One
httpbmjbmjjournalscomcollectionsstatsbkindexshtml
httpwwwmedstatsaagcommcqsasp
Relevant topics
Handling data1 4 5 6 7
Sampling10 11
Hypothesis testing17 18
Todayrsquos LectureTodayrsquos Lecture
What types of data are thereWhat types of data are there
(numerical vs categorical variables) (numerical vs categorical variables)
Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)
Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)
Types of dataTypes of data
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Types of Data
Numerical dataDiscrete
Examples No of children No asthma attacks in a week No of rooms in home
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
ChallengesChallenges
Statistical ideas can be difficult and Statistical ideas can be difficult and intimidatingintimidating
ThusThus Statistical results are often ldquoskipped-overrdquo Statistical results are often ldquoskipped-overrdquo
when reading scientific literaturewhen reading scientific literature Data is often misinterpretedData is often misinterpreted
Misinterpretation of DataMisinterpretation of Data
ldquoldquoCelebrating birthdays is healthyrdquoCelebrating birthdays is healthyrdquo
Statistics show that those that celebrate the most Statistics show that those that celebrate the most birthdays live the longestbirthdays live the longest
You may think thatYou may think that
A Bar Chart is a map of the locations of A Bar Chart is a map of the locations of the nearest tavernsthe nearest taverns
A p-value is the result of a urinalysisA p-value is the result of a urinalysis
A t-test is a taste test between rooibos tea A t-test is a taste test between rooibos tea and Five Roses teaand Five Roses tea
Course StructureCourse Structure
ldquoBIO-SADISTICSrdquo
Four 45-minute lectures
PowerPoint presentations on student web site
Some text (content) also on web page
Plus additional internet ldquolinksrdquo
Syllabus for the Course
1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions
SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals
SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power
SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods
Free textbook on-lineFree textbook on-line
Statistics at Square One
httpbmjbmjjournalscomcollectionsstatsbkindexshtml
httpwwwmedstatsaagcommcqsasp
Relevant topics
Handling data1 4 5 6 7
Sampling10 11
Hypothesis testing17 18
Todayrsquos LectureTodayrsquos Lecture
What types of data are thereWhat types of data are there
(numerical vs categorical variables) (numerical vs categorical variables)
Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)
Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)
Types of dataTypes of data
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Types of Data
Numerical dataDiscrete
Examples No of children No asthma attacks in a week No of rooms in home
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Misinterpretation of DataMisinterpretation of Data
ldquoldquoCelebrating birthdays is healthyrdquoCelebrating birthdays is healthyrdquo
Statistics show that those that celebrate the most Statistics show that those that celebrate the most birthdays live the longestbirthdays live the longest
You may think thatYou may think that
A Bar Chart is a map of the locations of A Bar Chart is a map of the locations of the nearest tavernsthe nearest taverns
A p-value is the result of a urinalysisA p-value is the result of a urinalysis
A t-test is a taste test between rooibos tea A t-test is a taste test between rooibos tea and Five Roses teaand Five Roses tea
Course StructureCourse Structure
ldquoBIO-SADISTICSrdquo
Four 45-minute lectures
PowerPoint presentations on student web site
Some text (content) also on web page
Plus additional internet ldquolinksrdquo
Syllabus for the Course
1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions
SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals
SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power
SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods
Free textbook on-lineFree textbook on-line
Statistics at Square One
httpbmjbmjjournalscomcollectionsstatsbkindexshtml
httpwwwmedstatsaagcommcqsasp
Relevant topics
Handling data1 4 5 6 7
Sampling10 11
Hypothesis testing17 18
Todayrsquos LectureTodayrsquos Lecture
What types of data are thereWhat types of data are there
(numerical vs categorical variables) (numerical vs categorical variables)
Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)
Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)
Types of dataTypes of data
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Types of Data
Numerical dataDiscrete
Examples No of children No asthma attacks in a week No of rooms in home
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
You may think thatYou may think that
A Bar Chart is a map of the locations of A Bar Chart is a map of the locations of the nearest tavernsthe nearest taverns
A p-value is the result of a urinalysisA p-value is the result of a urinalysis
A t-test is a taste test between rooibos tea A t-test is a taste test between rooibos tea and Five Roses teaand Five Roses tea
Course StructureCourse Structure
ldquoBIO-SADISTICSrdquo
Four 45-minute lectures
PowerPoint presentations on student web site
Some text (content) also on web page
Plus additional internet ldquolinksrdquo
Syllabus for the Course
1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions
SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals
SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power
SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods
Free textbook on-lineFree textbook on-line
Statistics at Square One
httpbmjbmjjournalscomcollectionsstatsbkindexshtml
httpwwwmedstatsaagcommcqsasp
Relevant topics
Handling data1 4 5 6 7
Sampling10 11
Hypothesis testing17 18
Todayrsquos LectureTodayrsquos Lecture
What types of data are thereWhat types of data are there
(numerical vs categorical variables) (numerical vs categorical variables)
Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)
Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)
Types of dataTypes of data
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Types of Data
Numerical dataDiscrete
Examples No of children No asthma attacks in a week No of rooms in home
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Course StructureCourse Structure
ldquoBIO-SADISTICSrdquo
Four 45-minute lectures
PowerPoint presentations on student web site
Some text (content) also on web page
Plus additional internet ldquolinksrdquo
Syllabus for the Course
1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions
SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals
SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power
SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods
Free textbook on-lineFree textbook on-line
Statistics at Square One
httpbmjbmjjournalscomcollectionsstatsbkindexshtml
httpwwwmedstatsaagcommcqsasp
Relevant topics
Handling data1 4 5 6 7
Sampling10 11
Hypothesis testing17 18
Todayrsquos LectureTodayrsquos Lecture
What types of data are thereWhat types of data are there
(numerical vs categorical variables) (numerical vs categorical variables)
Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)
Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)
Types of dataTypes of data
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Types of Data
Numerical dataDiscrete
Examples No of children No asthma attacks in a week No of rooms in home
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Syllabus for the Course
1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions
SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals
SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power
SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods
Free textbook on-lineFree textbook on-line
Statistics at Square One
httpbmjbmjjournalscomcollectionsstatsbkindexshtml
httpwwwmedstatsaagcommcqsasp
Relevant topics
Handling data1 4 5 6 7
Sampling10 11
Hypothesis testing17 18
Todayrsquos LectureTodayrsquos Lecture
What types of data are thereWhat types of data are there
(numerical vs categorical variables) (numerical vs categorical variables)
Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)
Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)
Types of dataTypes of data
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Types of Data
Numerical dataDiscrete
Examples No of children No asthma attacks in a week No of rooms in home
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Free textbook on-lineFree textbook on-line
Statistics at Square One
httpbmjbmjjournalscomcollectionsstatsbkindexshtml
httpwwwmedstatsaagcommcqsasp
Relevant topics
Handling data1 4 5 6 7
Sampling10 11
Hypothesis testing17 18
Todayrsquos LectureTodayrsquos Lecture
What types of data are thereWhat types of data are there
(numerical vs categorical variables) (numerical vs categorical variables)
Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)
Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)
Types of dataTypes of data
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Types of Data
Numerical dataDiscrete
Examples No of children No asthma attacks in a week No of rooms in home
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
httpwwwmedstatsaagcommcqsasp
Relevant topics
Handling data1 4 5 6 7
Sampling10 11
Hypothesis testing17 18
Todayrsquos LectureTodayrsquos Lecture
What types of data are thereWhat types of data are there
(numerical vs categorical variables) (numerical vs categorical variables)
Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)
Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)
Types of dataTypes of data
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Types of Data
Numerical dataDiscrete
Examples No of children No asthma attacks in a week No of rooms in home
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Todayrsquos LectureTodayrsquos Lecture
What types of data are thereWhat types of data are there
(numerical vs categorical variables) (numerical vs categorical variables)
Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)
Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)
Types of dataTypes of data
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Types of Data
Numerical dataDiscrete
Examples No of children No asthma attacks in a week No of rooms in home
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Types of dataTypes of data
Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Types of Data
Numerical dataDiscrete
Examples No of children No asthma attacks in a week No of rooms in home
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Types of Data
Numerical dataDiscrete
Examples No of children No asthma attacks in a week No of rooms in home
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Types of Data
Numerical dataContinuous
Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)
Examples Weight Age Temperature Heart rate
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Types of Data
Categorical dataNominal
Mutually exclusive unordered categoriesMutually exclusive unordered categories
ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)
Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Types of Data
Categorical dataOrdinal (ordered categories)
Examples Degree of agreement
(Strongly Agree Agree Disagree Strongly disagree)
Severity of injury Severe Moderate Mild
Income level High medium low
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
PRACTICEPRACTICE
mg of tar in cigarettesmg of tar in cigarettes
number of people in a carnumber of people in a car
high to low temperature inhigh to low temperature in
any dayany day
weightweight
timetime
number of children in thenumber of children in the
average familyaverage family
Average above avg Average above avg below averagebelow average
Colours of SmartiesColours of Smarties
Grades (A B C D F)Grades (A B C D F)
Discrete or Continuous
Continuous
Discrete
Continuous
Continuous
Continuous
Discrete
Nominal or Ordinal
Ordinal
Nominal
Ordinal
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Data SummariesData Summaries
It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data
You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying
You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Summarising and Describing Continuous Data
Measures of the centre of data (central tendency)
Mean
Median
Mode
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
DefinitionsDefinitions
The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores
The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median
The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip
hellip hellipon averagerdquoon averagerdquo
JM YancyJM Yancy
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Sample Mean X
The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)
1048793 ExampleSystolic blood pressures (mmHg)
X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
micro is pronounced lsquomursquo and denotes the mean of all values in a population
Notation
x is pronounced lsquox-barrsquo and denotes the mean of a set of
Sample values
(sigma) denotes the summation of a set of values
x is the variable usually used to represent the individual data values
n represents the number of data values in a sample
N represents the number of data values in a population
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
x =
Definitions
MeanMean
the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores
n x
Sample
Nmicro = xPopulation
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Notes on Sample Mean
Also called sample average or arithmetic mean
Sensitive to extreme values - One data point could make a great change in
sample mean
Why is it called the sample meanndash To distinguish it from population mean
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Population Versus Sample
Population - The entire group you want information about
ndash For example The blood pressure of all 20-year-old male university students in South Africa
Sample - A part of the population from which we actually collect information and draw conclusions about the whole population
ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa
The sample mean X is not the population mean micro
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Population Versus Sample
We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Weighted Mean
x =w
(w bull x)
Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Geometric MeansGeometric Means
These are histograms rotated 90ordm and box plots
Note how the log transformation gives a symmetric distribution
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers
MEDIAN is 45
4 + 5
2= 45
bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)
exact middle MEDIAN is 4
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Mode Mode
The score that occurs most frequentlyThe score that occurs most frequently
BimodalMultimodalNo Mode
The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Examples
bull Mode is 5Mode is 5
bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6
bull No ModeNo Mode
a 5 5 5 3 1 5 1 4 3 5
b 2 2 2 3 4 5 6 6 6 7 9
c 2 3 6 7 8 9 10
d 2 2 3 3 3 4
e 2 2 3 3 4 4 5 5
bull Mode is 3
bull No Mode
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Shapes of the Distribution
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Shapes of the Distribution
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Distribution Characteristics
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Shapes of the Distribution
Example Height of students in the class
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Shapes of the Distribution
Example Serum cholesterol level
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Shapes of the Distribution
Example Birth weight of newborn babies
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Shapes of the Distribution
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Some visual ways to summarize Some visual ways to summarize datadata
TablesTablesFrequency tableFrequency table
GraphsGraphsHistogramsHistograms
Bar graphsBar graphs
Box plotsBox plots
Line plotsLine plots
Scatter graphsScatter graphs
ChartsChartsBar chartBar chart
Pie diagramPie diagram
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Frequency TablesFrequency Tables
Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages
The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable
and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable
Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Example of frequency tableExample of frequency table
When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Graphical SummariesGraphical Summaries
HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis
Bar GraphsBar Graphs Nominal dataNominal data
No order to horizontal axisNo order to horizontal axis
Box PlotsBox Plots Continuous dataContinuous data
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
HistogramHistogram
A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie
on a linear scale representing different intervals and their heights are
proportional to the frequencies of the values within each of the intervals
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Bar ChartBar Chart
A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Difference between bar chart and Difference between bar chart and histogramhistogram
Bar charts for categories that are separateBar charts for categories that are separate
Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data
Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Line graphLine graph
If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line
graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with
line graphs rather than histograms
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Scatter plotScatter plot
Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Survival curveSurvival curve
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Pie chartPie chart
This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Pictures of DataContinuous Variables
Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
How to Make a Histogram
Divide range of data into intervals (bins) of equal width
Count the number of observations in each class
Draw the histogram
Label scales
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Pictures of Data Histograms
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Pictures of Data Histograms
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Pictures of Data Histograms
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Box plot
Another common visual display tool is the box plot
Gives good insight into distribution shape in terms of skewness and outlying values
Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Hospital Length of Stay
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Box plot Length of Stay
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Box plot Length of Stay
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Misuse of graphicsMisuse of graphics
It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney
Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided
Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Which graph to useWhich graph to use
Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics
Graph Name Y-axis X-axis
Histogram Count Category
Scatterplot Continuous Continuous
Dot Plot Continuous Category
Box Plot Percentiles Category
Line Plot Mean or value Category
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Example of MCQ 1Example of MCQ 1
The arithmetic mean of a set of values
a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Example of MCQ 2Example of MCQ 2
A histogram A histogram
a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata
b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars
c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar
d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution
e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables
Any questionsAny questions
Any questionsAny questions