+ All Categories
Home > Documents > Introduction to Biostatistics

Introduction to Biostatistics

Date post: 02-Jan-2016
Category:
Upload: casey-chase
View: 75 times
Download: 4 times
Share this document with a friend
Description:
Introduction to Biostatistics. Prof Haroon Saloojee Division of Community Paediatrics. Introduction to Biostatistics Lecture 1. Summarising your data 1. The evidence-based clinician’s motto. In God we trust. All others must bring data. Challenges. - PowerPoint PPT Presentation
Popular Tags:
68
Introduction to Introduction to Biostatistics Biostatistics Prof Haroon Saloojee Prof Haroon Saloojee Division of Community Paediatrics Division of Community Paediatrics
Transcript
Page 1: Introduction to Biostatistics

Introduction to Introduction to BiostatisticsBiostatistics

Prof Haroon SaloojeeProf Haroon Saloojee

Division of Community PaediatricsDivision of Community Paediatrics

Introduction to BiostatisticsIntroduction to BiostatisticsLecture 1Lecture 1

Summarising your data 1Summarising your data 1

In God we trust In God we trust

All others must bring dataAll others must bring data

The evidence-based clinicianrsquos The evidence-based clinicianrsquos mottomotto

ChallengesChallenges

Statistical ideas can be difficult and Statistical ideas can be difficult and intimidatingintimidating

ThusThus Statistical results are often ldquoskipped-overrdquo Statistical results are often ldquoskipped-overrdquo

when reading scientific literaturewhen reading scientific literature Data is often misinterpretedData is often misinterpreted

Misinterpretation of DataMisinterpretation of Data

ldquoldquoCelebrating birthdays is healthyrdquoCelebrating birthdays is healthyrdquo

Statistics show that those that celebrate the most Statistics show that those that celebrate the most birthdays live the longestbirthdays live the longest

You may think thatYou may think that

A Bar Chart is a map of the locations of A Bar Chart is a map of the locations of the nearest tavernsthe nearest taverns

A p-value is the result of a urinalysisA p-value is the result of a urinalysis

A t-test is a taste test between rooibos tea A t-test is a taste test between rooibos tea and Five Roses teaand Five Roses tea

Course StructureCourse Structure

ldquoBIO-SADISTICSrdquo

Four 45-minute lectures

PowerPoint presentations on student web site

Some text (content) also on web page

Plus additional internet ldquolinksrdquo

Syllabus for the Course

1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions

SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals

SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power

SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods

Free textbook on-lineFree textbook on-line

Statistics at Square One

httpbmjbmjjournalscomcollectionsstatsbkindexshtml

httpwwwmedstatsaagcommcqsasp

Relevant topics

Handling data1 4 5 6 7

Sampling10 11

Hypothesis testing17 18

Todayrsquos LectureTodayrsquos Lecture

What types of data are thereWhat types of data are there

(numerical vs categorical variables) (numerical vs categorical variables)

Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)

Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)

Types of dataTypes of data

Variable

Categorical Numerical

Nominal Ordinal Discrete Continuous

Types of Data

Numerical dataDiscrete

Examples No of children No asthma attacks in a week No of rooms in home

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 2: Introduction to Biostatistics

Introduction to BiostatisticsIntroduction to BiostatisticsLecture 1Lecture 1

Summarising your data 1Summarising your data 1

In God we trust In God we trust

All others must bring dataAll others must bring data

The evidence-based clinicianrsquos The evidence-based clinicianrsquos mottomotto

ChallengesChallenges

Statistical ideas can be difficult and Statistical ideas can be difficult and intimidatingintimidating

ThusThus Statistical results are often ldquoskipped-overrdquo Statistical results are often ldquoskipped-overrdquo

when reading scientific literaturewhen reading scientific literature Data is often misinterpretedData is often misinterpreted

Misinterpretation of DataMisinterpretation of Data

ldquoldquoCelebrating birthdays is healthyrdquoCelebrating birthdays is healthyrdquo

Statistics show that those that celebrate the most Statistics show that those that celebrate the most birthdays live the longestbirthdays live the longest

You may think thatYou may think that

A Bar Chart is a map of the locations of A Bar Chart is a map of the locations of the nearest tavernsthe nearest taverns

A p-value is the result of a urinalysisA p-value is the result of a urinalysis

A t-test is a taste test between rooibos tea A t-test is a taste test between rooibos tea and Five Roses teaand Five Roses tea

Course StructureCourse Structure

ldquoBIO-SADISTICSrdquo

Four 45-minute lectures

PowerPoint presentations on student web site

Some text (content) also on web page

Plus additional internet ldquolinksrdquo

Syllabus for the Course

1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions

SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals

SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power

SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods

Free textbook on-lineFree textbook on-line

Statistics at Square One

httpbmjbmjjournalscomcollectionsstatsbkindexshtml

httpwwwmedstatsaagcommcqsasp

Relevant topics

Handling data1 4 5 6 7

Sampling10 11

Hypothesis testing17 18

Todayrsquos LectureTodayrsquos Lecture

What types of data are thereWhat types of data are there

(numerical vs categorical variables) (numerical vs categorical variables)

Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)

Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)

Types of dataTypes of data

Variable

Categorical Numerical

Nominal Ordinal Discrete Continuous

Types of Data

Numerical dataDiscrete

Examples No of children No asthma attacks in a week No of rooms in home

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 3: Introduction to Biostatistics

In God we trust In God we trust

All others must bring dataAll others must bring data

The evidence-based clinicianrsquos The evidence-based clinicianrsquos mottomotto

ChallengesChallenges

Statistical ideas can be difficult and Statistical ideas can be difficult and intimidatingintimidating

ThusThus Statistical results are often ldquoskipped-overrdquo Statistical results are often ldquoskipped-overrdquo

when reading scientific literaturewhen reading scientific literature Data is often misinterpretedData is often misinterpreted

Misinterpretation of DataMisinterpretation of Data

ldquoldquoCelebrating birthdays is healthyrdquoCelebrating birthdays is healthyrdquo

Statistics show that those that celebrate the most Statistics show that those that celebrate the most birthdays live the longestbirthdays live the longest

You may think thatYou may think that

A Bar Chart is a map of the locations of A Bar Chart is a map of the locations of the nearest tavernsthe nearest taverns

A p-value is the result of a urinalysisA p-value is the result of a urinalysis

A t-test is a taste test between rooibos tea A t-test is a taste test between rooibos tea and Five Roses teaand Five Roses tea

Course StructureCourse Structure

ldquoBIO-SADISTICSrdquo

Four 45-minute lectures

PowerPoint presentations on student web site

Some text (content) also on web page

Plus additional internet ldquolinksrdquo

Syllabus for the Course

1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions

SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals

SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power

SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods

Free textbook on-lineFree textbook on-line

Statistics at Square One

httpbmjbmjjournalscomcollectionsstatsbkindexshtml

httpwwwmedstatsaagcommcqsasp

Relevant topics

Handling data1 4 5 6 7

Sampling10 11

Hypothesis testing17 18

Todayrsquos LectureTodayrsquos Lecture

What types of data are thereWhat types of data are there

(numerical vs categorical variables) (numerical vs categorical variables)

Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)

Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)

Types of dataTypes of data

Variable

Categorical Numerical

Nominal Ordinal Discrete Continuous

Types of Data

Numerical dataDiscrete

Examples No of children No asthma attacks in a week No of rooms in home

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 4: Introduction to Biostatistics

ChallengesChallenges

Statistical ideas can be difficult and Statistical ideas can be difficult and intimidatingintimidating

ThusThus Statistical results are often ldquoskipped-overrdquo Statistical results are often ldquoskipped-overrdquo

when reading scientific literaturewhen reading scientific literature Data is often misinterpretedData is often misinterpreted

Misinterpretation of DataMisinterpretation of Data

ldquoldquoCelebrating birthdays is healthyrdquoCelebrating birthdays is healthyrdquo

Statistics show that those that celebrate the most Statistics show that those that celebrate the most birthdays live the longestbirthdays live the longest

You may think thatYou may think that

A Bar Chart is a map of the locations of A Bar Chart is a map of the locations of the nearest tavernsthe nearest taverns

A p-value is the result of a urinalysisA p-value is the result of a urinalysis

A t-test is a taste test between rooibos tea A t-test is a taste test between rooibos tea and Five Roses teaand Five Roses tea

Course StructureCourse Structure

ldquoBIO-SADISTICSrdquo

Four 45-minute lectures

PowerPoint presentations on student web site

Some text (content) also on web page

Plus additional internet ldquolinksrdquo

Syllabus for the Course

1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions

SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals

SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power

SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods

Free textbook on-lineFree textbook on-line

Statistics at Square One

httpbmjbmjjournalscomcollectionsstatsbkindexshtml

httpwwwmedstatsaagcommcqsasp

Relevant topics

Handling data1 4 5 6 7

Sampling10 11

Hypothesis testing17 18

Todayrsquos LectureTodayrsquos Lecture

What types of data are thereWhat types of data are there

(numerical vs categorical variables) (numerical vs categorical variables)

Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)

Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)

Types of dataTypes of data

Variable

Categorical Numerical

Nominal Ordinal Discrete Continuous

Types of Data

Numerical dataDiscrete

Examples No of children No asthma attacks in a week No of rooms in home

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 5: Introduction to Biostatistics

Misinterpretation of DataMisinterpretation of Data

ldquoldquoCelebrating birthdays is healthyrdquoCelebrating birthdays is healthyrdquo

Statistics show that those that celebrate the most Statistics show that those that celebrate the most birthdays live the longestbirthdays live the longest

You may think thatYou may think that

A Bar Chart is a map of the locations of A Bar Chart is a map of the locations of the nearest tavernsthe nearest taverns

A p-value is the result of a urinalysisA p-value is the result of a urinalysis

A t-test is a taste test between rooibos tea A t-test is a taste test between rooibos tea and Five Roses teaand Five Roses tea

Course StructureCourse Structure

ldquoBIO-SADISTICSrdquo

Four 45-minute lectures

PowerPoint presentations on student web site

Some text (content) also on web page

Plus additional internet ldquolinksrdquo

Syllabus for the Course

1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions

SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals

SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power

SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods

Free textbook on-lineFree textbook on-line

Statistics at Square One

httpbmjbmjjournalscomcollectionsstatsbkindexshtml

httpwwwmedstatsaagcommcqsasp

Relevant topics

Handling data1 4 5 6 7

Sampling10 11

Hypothesis testing17 18

Todayrsquos LectureTodayrsquos Lecture

What types of data are thereWhat types of data are there

(numerical vs categorical variables) (numerical vs categorical variables)

Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)

Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)

Types of dataTypes of data

Variable

Categorical Numerical

Nominal Ordinal Discrete Continuous

Types of Data

Numerical dataDiscrete

Examples No of children No asthma attacks in a week No of rooms in home

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 6: Introduction to Biostatistics

You may think thatYou may think that

A Bar Chart is a map of the locations of A Bar Chart is a map of the locations of the nearest tavernsthe nearest taverns

A p-value is the result of a urinalysisA p-value is the result of a urinalysis

A t-test is a taste test between rooibos tea A t-test is a taste test between rooibos tea and Five Roses teaand Five Roses tea

Course StructureCourse Structure

ldquoBIO-SADISTICSrdquo

Four 45-minute lectures

PowerPoint presentations on student web site

Some text (content) also on web page

Plus additional internet ldquolinksrdquo

Syllabus for the Course

1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions

SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals

SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power

SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods

Free textbook on-lineFree textbook on-line

Statistics at Square One

httpbmjbmjjournalscomcollectionsstatsbkindexshtml

httpwwwmedstatsaagcommcqsasp

Relevant topics

Handling data1 4 5 6 7

Sampling10 11

Hypothesis testing17 18

Todayrsquos LectureTodayrsquos Lecture

What types of data are thereWhat types of data are there

(numerical vs categorical variables) (numerical vs categorical variables)

Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)

Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)

Types of dataTypes of data

Variable

Categorical Numerical

Nominal Ordinal Discrete Continuous

Types of Data

Numerical dataDiscrete

Examples No of children No asthma attacks in a week No of rooms in home

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 7: Introduction to Biostatistics

Course StructureCourse Structure

ldquoBIO-SADISTICSrdquo

Four 45-minute lectures

PowerPoint presentations on student web site

Some text (content) also on web page

Plus additional internet ldquolinksrdquo

Syllabus for the Course

1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions

SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals

SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power

SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods

Free textbook on-lineFree textbook on-line

Statistics at Square One

httpbmjbmjjournalscomcollectionsstatsbkindexshtml

httpwwwmedstatsaagcommcqsasp

Relevant topics

Handling data1 4 5 6 7

Sampling10 11

Hypothesis testing17 18

Todayrsquos LectureTodayrsquos Lecture

What types of data are thereWhat types of data are there

(numerical vs categorical variables) (numerical vs categorical variables)

Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)

Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)

Types of dataTypes of data

Variable

Categorical Numerical

Nominal Ordinal Discrete Continuous

Types of Data

Numerical dataDiscrete

Examples No of children No asthma attacks in a week No of rooms in home

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 8: Introduction to Biostatistics

Syllabus for the Course

1048793 SESSION 1 Summarizing your data 1Types of data (quantitative and categorical variables) Describing data- average (mean median and mode)Displaying data graphically (box plots histograms bar charts pie diagrams) Frequency distributions

SESSION 2 Summarizing your data 2The normal distributionDescribing data ndash spread (range variance standard deviation z score)Quartiles percentilesStandard error of the mean Confidence intervals

SESSION 3SESSION 3 Sampling principles Sampling principles Study PopulationStudy PopulationThe sampleThe sampleRandom samplingRandom samplingNon random sampling Non random sampling Sampling biasSampling biasSample size and powerSample size and power

SESSION 4SESSION 4 Statistical tests and Statistical tests and the concept of significancethe concept of significance Hypothesis testingHypothesis testingp valuep value Statistical versus clinical Statistical versus clinical significancesignificanceParametric versus non-parametric Parametric versus non-parametric methodsmethods

Free textbook on-lineFree textbook on-line

Statistics at Square One

httpbmjbmjjournalscomcollectionsstatsbkindexshtml

httpwwwmedstatsaagcommcqsasp

Relevant topics

Handling data1 4 5 6 7

Sampling10 11

Hypothesis testing17 18

Todayrsquos LectureTodayrsquos Lecture

What types of data are thereWhat types of data are there

(numerical vs categorical variables) (numerical vs categorical variables)

Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)

Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)

Types of dataTypes of data

Variable

Categorical Numerical

Nominal Ordinal Discrete Continuous

Types of Data

Numerical dataDiscrete

Examples No of children No asthma attacks in a week No of rooms in home

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 9: Introduction to Biostatistics

Free textbook on-lineFree textbook on-line

Statistics at Square One

httpbmjbmjjournalscomcollectionsstatsbkindexshtml

httpwwwmedstatsaagcommcqsasp

Relevant topics

Handling data1 4 5 6 7

Sampling10 11

Hypothesis testing17 18

Todayrsquos LectureTodayrsquos Lecture

What types of data are thereWhat types of data are there

(numerical vs categorical variables) (numerical vs categorical variables)

Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)

Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)

Types of dataTypes of data

Variable

Categorical Numerical

Nominal Ordinal Discrete Continuous

Types of Data

Numerical dataDiscrete

Examples No of children No asthma attacks in a week No of rooms in home

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 10: Introduction to Biostatistics

httpwwwmedstatsaagcommcqsasp

Relevant topics

Handling data1 4 5 6 7

Sampling10 11

Hypothesis testing17 18

Todayrsquos LectureTodayrsquos Lecture

What types of data are thereWhat types of data are there

(numerical vs categorical variables) (numerical vs categorical variables)

Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)

Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)

Types of dataTypes of data

Variable

Categorical Numerical

Nominal Ordinal Discrete Continuous

Types of Data

Numerical dataDiscrete

Examples No of children No asthma attacks in a week No of rooms in home

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 11: Introduction to Biostatistics

Todayrsquos LectureTodayrsquos Lecture

What types of data are thereWhat types of data are there

(numerical vs categorical variables) (numerical vs categorical variables)

Describing data - measures of central tendency Describing data - measures of central tendency (m(mean median and mode)ean median and mode)

Summarising data graphicallySummarising data graphically (histograms box (histograms box plots bar charts pie diagrams) plots bar charts pie diagrams)

Types of dataTypes of data

Variable

Categorical Numerical

Nominal Ordinal Discrete Continuous

Types of Data

Numerical dataDiscrete

Examples No of children No asthma attacks in a week No of rooms in home

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 12: Introduction to Biostatistics

Types of dataTypes of data

Variable

Categorical Numerical

Nominal Ordinal Discrete Continuous

Types of Data

Numerical dataDiscrete

Examples No of children No asthma attacks in a week No of rooms in home

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 13: Introduction to Biostatistics

Types of Data

Numerical dataDiscrete

Examples No of children No asthma attacks in a week No of rooms in home

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 14: Introduction to Biostatistics

Types of Data

Numerical dataContinuous

Any value on the continuum is possible (even fractions or Any value on the continuum is possible (even fractions or decimals)decimals)

Examples Weight Age Temperature Heart rate

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 15: Introduction to Biostatistics

Types of Data

Categorical dataNominal

Mutually exclusive unordered categoriesMutually exclusive unordered categories

ExamplesExamples Sex (male female)Sex (male female) Eye colour (brown grey green blue)Eye colour (brown grey green blue) Are you happy (Yes No)Are you happy (Yes No) Diarrhoea (Present absent)Diarrhoea (Present absent)

Can summarize inCan summarize in Tables ndash using counts and percentagesTables ndash using counts and percentages Bar ChartBar Chart

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 16: Introduction to Biostatistics

Types of Data

Categorical dataOrdinal (ordered categories)

Examples Degree of agreement

(Strongly Agree Agree Disagree Strongly disagree)

Severity of injury Severe Moderate Mild

Income level High medium low

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 17: Introduction to Biostatistics

PRACTICEPRACTICE

mg of tar in cigarettesmg of tar in cigarettes

number of people in a carnumber of people in a car

high to low temperature inhigh to low temperature in

any dayany day

weightweight

timetime

number of children in thenumber of children in the

average familyaverage family

Average above avg Average above avg below averagebelow average

Colours of SmartiesColours of Smarties

Grades (A B C D F)Grades (A B C D F)

Discrete or Continuous

Continuous

Discrete

Continuous

Continuous

Continuous

Discrete

Nominal or Ordinal

Ordinal

Nominal

Ordinal

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 18: Introduction to Biostatistics

Data SummariesData Summaries

It is ALWAYS a good idea to summarise It is ALWAYS a good idea to summarise your datayour data

You become familiar with the data and the You become familiar with the data and the characteristics of the people that you are characteristics of the people that you are studyingstudying

You can also identify problems or errors with You can also identify problems or errors with the data (data management issues)the data (data management issues)

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 19: Introduction to Biostatistics

Summarising and Describing Continuous Data

Measures of the centre of data (central tendency)

Mean

Median

Mode

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 20: Introduction to Biostatistics

DefinitionsDefinitions

The arithmetic The arithmetic meanmean is what is commonly called is what is commonly called the average The mean is the sum of all the the average The mean is the sum of all the scores divided by the number of scoresscores divided by the number of scores

The The medianmedian is the middle of a distribution half is the middle of a distribution half the scores are above the median and half are the scores are above the median and half are below the median below the median

The The modemode is the most frequently occurring score is the most frequently occurring score in a distribution in a distribution

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 21: Introduction to Biostatistics

ldquoldquoIt has been said that a fellow with one It has been said that a fellow with one leg frozen in ice and the other leg in leg frozen in ice and the other leg in boiling water is comfortablehellipboiling water is comfortablehellip

hellip hellipon averagerdquoon averagerdquo

JM YancyJM Yancy

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 22: Introduction to Biostatistics

Sample Mean X

The Average or Arithmetic MeanAdd up data then divide by sample size (n)The sample size n is the number of observations (pieces of data)

1048793 ExampleSystolic blood pressures (mmHg)

X1 = 120X2 = 80X3 = 90X4 = 110X5 = 95n = 5

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 23: Introduction to Biostatistics

micro is pronounced lsquomursquo and denotes the mean of all values in a population

Notation

x is pronounced lsquox-barrsquo and denotes the mean of a set of

Sample values

(sigma) denotes the summation of a set of values

x is the variable usually used to represent the individual data values

n represents the number of data values in a sample

N represents the number of data values in a population

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 24: Introduction to Biostatistics

x =

Definitions

MeanMean

the value obtained by adding the scores and dividing the total by the the value obtained by adding the scores and dividing the total by the number of scoresnumber of scores

n x

Sample

Nmicro = xPopulation

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 25: Introduction to Biostatistics

Notes on Sample Mean

Also called sample average or arithmetic mean

Sensitive to extreme values - One data point could make a great change in

sample mean

Why is it called the sample meanndash To distinguish it from population mean

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 26: Introduction to Biostatistics

Population Versus Sample

Population - The entire group you want information about

ndash For example The blood pressure of all 20-year-old male university students in South Africa

Sample - A part of the population from which we actually collect information and draw conclusions about the whole population

ndash For example Sample of blood pressures (n=50) of 20-year-old male university students in South Africa

The sample mean X is not the population mean micro

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 27: Introduction to Biostatistics

Population Versus Sample

We donrsquot know the population mean micro but would like to know itWe draw a sample from the populationWe calculate the sample mean XHow close is X to microStatistical theory will tell us how close X is to microStatistical inference is the process of trying to draw conclusions about the population from the sample

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 28: Introduction to Biostatistics

Weighted Mean

x =w

(w bull x)

Your grade in many courses are weighted means (averages) In other words some things count (are weighted) more than others

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 29: Introduction to Biostatistics

Geometric MeansGeometric Means

These are histograms rotated 90ordm and box plots

Note how the log transformation gives a symmetric distribution

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 30: Introduction to Biostatistics

bull 1 1 3 3 4 5 5 5 5 5 no exact middle -- shared by two numbers

MEDIAN is 45

4 + 5

2= 45

bull 5 5 5 3 1 5 1 4 3 5 2 bull 1 1 2 3 3 4 5 5 5 5 5 (in order)

exact middle MEDIAN is 4

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 31: Introduction to Biostatistics

Mode Mode

The score that occurs most frequentlyThe score that occurs most frequently

BimodalMultimodalNo Mode

The only measure of central tendency that can be used The only measure of central tendency that can be used with with nominal datadata

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 32: Introduction to Biostatistics

Examples

bull Mode is 5Mode is 5

bull Bimodal ndash 2 amp 6Bimodal ndash 2 amp 6

bull No ModeNo Mode

a 5 5 5 3 1 5 1 4 3 5

b 2 2 2 3 4 5 6 6 6 7 9

c 2 3 6 7 8 9 10

d 2 2 3 3 3 4

e 2 2 3 3 4 4 5 5

bull Mode is 3

bull No Mode

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 33: Introduction to Biostatistics

Shapes of the Distribution

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 34: Introduction to Biostatistics

Shapes of the Distribution

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 35: Introduction to Biostatistics

Distribution Characteristics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 36: Introduction to Biostatistics

Shapes of the Distribution

Example Height of students in the class

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 37: Introduction to Biostatistics

Shapes of the Distribution

Example Serum cholesterol level

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 38: Introduction to Biostatistics

Shapes of the Distribution

Example Birth weight of newborn babies

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 39: Introduction to Biostatistics

Shapes of the Distribution

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 40: Introduction to Biostatistics

Some visual ways to summarize Some visual ways to summarize datadata

TablesTablesFrequency tableFrequency table

GraphsGraphsHistogramsHistograms

Bar graphsBar graphs

Box plotsBox plots

Line plotsLine plots

Scatter graphsScatter graphs

ChartsChartsBar chartBar chart

Pie diagramPie diagram

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 41: Introduction to Biostatistics

Frequency TablesFrequency Tables

Summarizes a variable with counts and Summarizes a variable with counts and percentagespercentages

The variable is categorical The variable is categorical Note that you can take a continuous variable Note that you can take a continuous variable

and create categories with itand create categories with itHow do you create categories for a continuous How do you create categories for a continuous variablevariable

Choose cutoffs that are biologically meaningfulChoose cutoffs that are biologically meaningful Natural breaks in the dataNatural breaks in the data

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 42: Introduction to Biostatistics

Example of frequency tableExample of frequency table

When raw data are arranged with frequencies they are said to form a frequency table for ungrouped dataWhen the data are divided into groups classes they are called grouped dataThe classes have to be decided according to the range of data and size of classThe number of observations lying in a particular class is called its frequency and the table showing classes with frequencies is called a frequency table The total of frequencies of a particular class and of all classes prior to that class is called the cumulative frequency of that class

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 43: Introduction to Biostatistics

Graphical SummariesGraphical Summaries

HistogramsHistograms Continuous or ordinal data on horizontal axisContinuous or ordinal data on horizontal axis

Bar GraphsBar Graphs Nominal dataNominal data

No order to horizontal axisNo order to horizontal axis

Box PlotsBox Plots Continuous dataContinuous data

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 44: Introduction to Biostatistics

HistogramHistogram

A histogram is a graphic representation of the frequency distribution of a variable Vertical rectangles (bars) are drawn in such a way that their bases lie

on a linear scale representing different intervals and their heights are

proportional to the frequencies of the values within each of the intervals

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 45: Introduction to Biostatistics

Bar ChartBar Chart

A bar chart is a method of presenting discrete data organized in such a way that each observation can fall into one of mutually exclusive categories The frequencies (or percentages) are listed along the Y axis and the categories of the variable along the X axis The heights of the bars correspond to the frequencies The bars should be of equal width and they should not be touching me other bars

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 46: Introduction to Biostatistics

Difference between bar chart and Difference between bar chart and histogramhistogram

Bar charts for categories that are separateBar charts for categories that are separate

Histograms if you got categories by Histograms if you got categories by dividing up continuous datadividing up continuous data

Bars do not touch histogram rectangles Bars do not touch histogram rectangles do touch do touch

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 47: Introduction to Biostatistics

Line graphLine graph

If the mid-points of the top of the bars of a histogram are connected together by a line and if the bars were omitted from the display the resultant graph will be a line

graph (also called a frequency polygon) Line graphs are good at showing trends over a period of time When trends of rates (eg death rate Infant Mortality Rate etc) are to be displayed it is better done with

line graphs rather than histograms

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 48: Introduction to Biostatistics

Scatter plotScatter plot

Also called a scattergram This a method of displaying the distribution of two variables in relation to each other another The value of one variables is measured on the X axis and the values of the other on the Y axis The variables have to be on a continuous scale Each plot thus has two values (coordinates) from the Y and X axis scales A wide scatter of the plots denotes poor correlation between the two variables If the two variables are perfectly correlated then all the plots will fall on the diagonal (regression line)

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 49: Introduction to Biostatistics

Survival curveSurvival curve

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 50: Introduction to Biostatistics

Pie chartPie chart

This is a circular diagram (can be shown as 2-D or 3-D) divided into segments each representing a category or subset of data (part of the whole) The amount for each category is proportional to the area of the sector (slice of the pie) The total area of the circle is 100 and it represents the total population that is being shown

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 51: Introduction to Biostatistics

Pictures of DataContinuous Variables

Histograms Means and medians do not tell whole story Differences in spread (variability) Differences in shape of the distribution

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 52: Introduction to Biostatistics

How to Make a Histogram

Divide range of data into intervals (bins) of equal width

Count the number of observations in each class

Draw the histogram

Label scales

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 53: Introduction to Biostatistics

Pictures of Data Histograms

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 54: Introduction to Biostatistics

Pictures of Data Histograms

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 55: Introduction to Biostatistics

Pictures of Data Histograms

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 56: Introduction to Biostatistics

Box plot

Another common visual display tool is the box plot

Gives good insight into distribution shape in terms of skewness and outlying values

Very nice tool for easily comparing distribution of continuous data in multiple groups ndash can be plotted side by side

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 57: Introduction to Biostatistics

Box plotBox plotA box plot provides an excellent visual summary of many important aspects of a distribution The box stretches from the lower hinge (defined as the 25th percentile) to the upper hinge (the 75th percentile) and therefore contains the middle half of the scores in the distributionThe median is shown as a line across the box Therefore 14 of the distribution is between this line and the top of the box and 14 of the distribution is between this line and the bottom of the box

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 58: Introduction to Biostatistics

Hospital Length of Stay

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 59: Introduction to Biostatistics

Box plot Length of Stay

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 60: Introduction to Biostatistics

Box plot Length of Stay

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 61: Introduction to Biostatistics

Misuse of graphicsMisuse of graphics

It pays to be wide awake in studying any graph The It pays to be wide awake in studying any graph The thing looks so simple so frank and so appealing that thing looks so simple so frank and so appealing that the careless are easily fooled - M J Moroney the careless are easily fooled - M J Moroney

Graphs and charts are often misused The honest Graphs and charts are often misused The honest researcher must have a good handle on how graphs can researcher must have a good handle on how graphs can be used to deliberately mislead people so that such be used to deliberately mislead people so that such misadventures can be avoided misadventures can be avoided

Common tricks used to mislead Common tricks used to mislead The problem of scalingThe problem of scaling The Advertisers Graph The Advertisers Graph The transformed graph The transformed graph The chart with too much dataThe chart with too much data

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 62: Introduction to Biostatistics

Which graph to useWhich graph to use

Statistical methods depend on the ldquoformrdquo of a set of data which can be assessed with some common useful graphics

Graph Name Y-axis X-axis

Histogram Count Category

Scatterplot Continuous Continuous

Dot Plot Continuous Category

Box Plot Percentiles Category

Line Plot Mean or value Category

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 63: Introduction to Biostatistics

Example of MCQ 1Example of MCQ 1

The arithmetic mean of a set of values

a) Is a particular type of averageb) Is a useful summary measure of location if the data are skewed to the rightc) Coincides with the median if the distribution of the data is symmetricald) Is always greater than the mediane) Cannot be calculated if the data set contains both positive and negative values

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 64: Introduction to Biostatistics

Example of MCQ 2Example of MCQ 2

A histogram A histogram

a) Can be used instead of a pie chart to display categorical a) Can be used instead of a pie chart to display categorical datadata

b) Is similar to a bar chart but there are no gaps between the b) Is similar to a bar chart but there are no gaps between the barsbars

c) Contains contiguous bars with the height of each bar being c) Contains contiguous bars with the height of each bar being proportional to the frequency of the observations in the proportional to the frequency of the observations in the range specified by the barrange specified by the bar

d) Can be used to display either a frequency or a relative d) Can be used to display either a frequency or a relative frequency distributionfrequency distribution

e) Is used to show the relationship between two variablese) Is used to show the relationship between two variables

Any questionsAny questions

Page 65: Introduction to Biostatistics

Any questionsAny questions


Recommended