PRESENTED BY DR.RAJ KUMAR SINGH(JR-1) DEPTT .OF ORTHODONTICS AND DENTAL ANATOMY
SUPERVISOR DR. SANJEEV KUMAR VERMA CHAIRMAN, DEPTT.OF ORTHODONTICS AND DENTAL ANATOMY DR.Z.A.DENTAL COLLEGE,AMU,ALIGARH
CO-SUPERVISOR DR.MD.SAIF KHAN LECTURER, DEPTT .OF PERIODONTICS DR.Z.A.DENTAL COLLEGE,AMU,ALIGARH
Overview of seminar Introduction to medical computing Role of medical computing Introduction to statistics
How to use statistics Role of statistics conclusion
What does it mean computers
in medicine ?
The Computer Meets Medicine and Biology:
Emergence of a Discipline
After taking this course, you should know the answers to these questions:
Why
is information management a central issue in biomedical research and clinical practice?
What are integrated information-management environments and how might we expect them to affect the practice of medicine and biomedical research in coming years?
What do we mean by the terms medical computer science, medical computing ,medical informatics, clinical informatics, nursing informatics, bioinformatics, and health informatics? Why should health professionals and students of the
health professions learn about medical-informatics concepts and informatics applications?
How has the development of mini-computers, microprocessors, and the Internet changed the nature of biomedical computing? How is medical informatics related to clinical practice , biomedical engineering, molecular biology, decision science, information science, and computer science?
Role of computing Medical Decision making: Probabilistic medical
reasoning. Patient care systems. Patient monitoring systems. Computer aided surgery. Computer based patient record systems.
Clinical decision support systems. The internet. Standards in medical informatics.
Imaging modalities. Image management systems. Telemedicine. Bioinformatics.
Conventional data collection for clinical trialMedical records
Data sheets
Computer database
Analyses
Results
WHAT IS STATISTICS
Introduction Statistics is a science that comprises data collection
methods,processing of data into useful information,and utilising this information in taking decisions with least error Medical statistics-
A collection of statistical procedures particularly well-suited to the analysis of healthcare-related data.
Medicine is empirical science depending on observations Medical data are necessary for any medical decision,be it for diagnosis,for treatment planning or prognosis,is that some information is availaible for the patient
Also for medico-legal or ethical reasons,to justify Thus, they must be evidence based
Uncertainities in medicine arise mainly due to
1)biological variability 2)environmental variability 3)sampling fluctuations 4)chance variability 5)instrument variability
To deal with such enormous uncertainities that
pervade all aspects of medical practice, a separate science has developed,called biostatistics It provides methods to measure uncertainities by
probabilities ,and helps to control the impact of uncertainities on medical practice by laying principles to choose decisions that judiciously combine the probabilities with judgements
How to use statistics Develop an underlying question of interest Generate a hypothesis Design a study
Collect Data Analyze Data Descriptive statistics Statistical Inference
Hypothesis: tentative assumption of the study or expected results of the study It should be very specific and limited to the piece of
research in hand because it has to be tested.
The role of hypothesis is to guide the researcher by
delimiting the area of research and to keep him on the right track.
Develop Study Design Research question Study sample Sample size Enrollment/Follow-up strategies On-going monitoring
sampling Sample is that part of target population which is
actually enquired on or investigated Types of sampling:-
1)Simple random 2)systematic random 3)stratified random 4)cluster random 5)multistage random
Existing data Primary data are those which one elicits from
individual patients,subjects or other units(such as hospitals or laboratories) directly Secondary data are those that are elicited by others Secondary data sources include disease specific
database on the web,medical literature,records of surveys and registrations done by the government
Generation of new data Existing data may be incomplete and insufficient to
provide answers to specific questions For these data are specially generated through new surveys
and experiments Basically there are two types of studies to generate new
data-descriptive and analytical In either setup,it is necessary that a sample of subjects is
studied
data collection designs
Objective
descriptive
analytical
Method
surveys
observational
experimental
Time frame
prospective retrospective crosssectional(One point time)
(Cohort-cause to effect) (Effect to cause)
Setting
animal trial
Describing data with tables 1) frequency table 2) relative and cumulative frequency 3) grouped frequency
4) open- ended groups 5) cross-tabulation
Frequency tablevariables frequency
Mortality (%)11.2-15.1 15.2-20.1 20.2-25.1 25.2-30.1
Tally1, 1, 1, 1, 1, 1, 1, 1, 1 1, 1, 1, 1, 1, 1, 1, 1 1, 1, 1, 1, 1 1, 1, 1
No. of ICU9 8 5 3
30.2-35.1
1,
1
Relative and cumulative frequencyparity No.of women Percentage (relative frequency) Cumulative percentage
01 2 3 4 7 8
56 14 10 3 1 1
12.515 35 25 7.5 2.5 25
12.527.5 62.5 87.5 95 97.5 100
Cross tabulationTwo variables within a single group of individualsCaries Yes Occlusal 21 (84%) (66) 2 or fewer children No 11 (73%)(34) 32(100) Totals
proximal Totals
4 (16%) (50) 4 (27%)(50) 25(100%) 15(100%)
8(100) 40
Describing data with charts1) Charting nominal data
(1) the pie chart
(2) the simple bar chart(3) the cluster bar chart (4) the stacked bar chart
2) Charting ordinal data(1) the pie chart (2) the bar chart
3) Charting discrete metric data 4) Charting continuous metric data 1)the histogram
Pie chart
4-5 categories One variable Start at 0 in the same order as the tablePie chart: Hair color of children reciving d-phenothrin
dark , 21, 21%
blonde, 18, 18% blonde
red, 4, 4%
brown red dark
brown, 55, 57%
Simple bar diagram
Clustered bar diagramCluster percetage bar chart of the hair color receiving Malathion and dphenothrin60 50 40 30 20 10 0 malathion d-penothrin 16 4 28 18 4 22 blonde brown red dark 52 56
HistogramExercise 3-5, Histogram40 35 30 25 20 15 10 5 0 19 20-24 25-29 30-34 35 Percentage age distribution of pregnant women Thrombosis cases
Step chartExercise 3.8 Cumulative percetage o finfants 120 100 90 80 60 40 20 0 0 60 36.67 16.67 6.67 5 10 Cumulative percetage o finfants 100
Charting cumulative ordinal or discrete metric data
Cumulative frequency curveExercise 3.9 Ogive120 100 80 60 40 20 0 15-24 Attempting suicide Later successful
25-34
35-44
45-54
55-64
65-74
75-84
> 85
Percentage cumulative frequency curves of age for male suicide attempters and later succeeders
Data collection ,types and quality Evidence based decisions are only as good as the
evidence itself Thus it is important that the data gathered for creating
evidence is correct
Methods such as interview,examination ,investigations
are availaible
He must decide which method is best for particular
information
Data can be either , quantitative or qualitative Qualitative data can be on nominal scale or ordinal
scale Quantitative data are on metric scale
Nominal scale data It can be allocated into one of a number of categories. Blood type, sex(male/female) No meaningful order
Ordinal scale data It can be allocated to one of a number of categories but
be put in meaningful order. Very satisfied, satisfied, neutral, unsatisfied, very
unsatisfied.
Descrete metric data Countable variables. Integer form Numbers of things Age, numbers of men
Continuous metric data Measurable variables. Round to the nearest integer Kg, m, mmHg, hour, years
Quality of data is assessed in terms of validity and
reliability of the measurements or of the tools used to obtain the data Validity - the ability to correctly measure the
characteristic that it purports to measure
For tests,this is assessed in terms of sensitivity-
specificity ,and positive and negative predictivities Reliability - the ability to give same result when used
repeatedly in identical conditions
Statisitcal analyses Descriptive Statistics Describe the sample Inference Make inferences about the population Primarily performed in two ways:
Hypothesis testing Estimation (more important !!)
Prediction
Descriptive statistics Descriptive statistics are a way of summarizing the complexity of the data with a single number.
A. For one variable ("univariate analysis"): Measures of "CENTRAL TENDENCY") (averages) and of DISPERSION or variance around that average. Examples: Means, Modes, Medians, Standard Deviation, quartiles
B. Descriptive statistics for the strength of relationship between two variables (bivariate analysis) or among a set of variables (multivariate analysis) are measures of ASSOCIATION or correlation.
Measure of central tendency
Nominal & Ordinal Frequencies Percents Medians Modes (all)
Interval & Ratio
Means
Measure of dispersionNominal & Ordinal (qualitative) Range Deviation Interval & Ratio(quantitative) Standard Quartiles
Measure of associationNominal & Ordinal Interval & Ratio
Cross-tabulation Non-Parametric Phi, Gamma , Eta Lamda, Tau-B etc.
Pearson's R
Measure of significanceNominal & Ordinal Chi Squre ,t-test Interval & Ratio Anova (F-ratio)
Inferential statistics Are measures of the SIGNIFICANCE of the relationship between two or more variables. Significance refers to the probability that the findings could be attributed to sampling error. Appropriate statistics depend on the LEVEL OF MEASUREMENT OF THE DEPENDENT VARIABLE (and of the independent variable).
Parameters Summary measures , as mean and standard deviation
can be obtained for a sample as also for entire population Summary measures,when obtained for the entire
target population ,are called parameters The values of parameters are hardly ever known
because nobody has time and resources to study the entire population
When parameter values are unknown,as almost
invariably is,it becomes necessary to fall back on samples to get some tangible lead regarding the characteristic of population Measures such as mean and SD when obtained for
sample subjects are called statistics
Standard deviation and normalmean
Tests of parametric significance1) Student t-test:
for comparison of mean between 2 groups 2) Anova F-test:
for comparison of means in three or more groups (both the above test requires that the means follow a Gaussian distribution and hence are called parametric tests)
Nonparametric test When sample size is very small and distribution is
skewed, parametric tests cannot be used In such cases ,non parametric tests(less powerful test
than parametric) are used For paired data - non-parametric tests commonly used
are sign test and other is Wilcoxon signed rank test
For unpaired two-sample data - the non-parametric
test is Mann-Whitney test Another important non-parametic test is Chi-square
test(used for nominal data),a test of proportion This is used to test the significance of association of
two or more qualitative characteristics
Point estimation and standard error It is a reality that samples in all likelihood will differ
from one another Even though there is rarely a need for a second sample
in scientific endeavours provided the first is chosen with due precautions such as random selection and inclusion of sufficient number of individuals
In such cases ,summary measures based on one
sample alone are considered good estimates of the respective characteristics of target population These are called point estimates
Although point estimates obtained from carefully
derived sample are fairly representative of population parameters,uncertainities arising out of sampling variation must be taken into account Sampling variation is a reality that says that samples in
all likelihood will differ from one another
S.E. of mean calculates these uncertainities Point estimates have reliability only when SE is small
Confidance interval When SE is large,an interval estimate should be
obtained This is also called confidence interval This is the range that is very likely to contain the
parameter value
This likelihood is called confidence level Generally a 95% confidence level is used The 95% CI is obtained as statistic+_2 SE of that
statistic
Null hypothesis It is the hypothesis that says that there is no
difference,or that asserts the existing knowledge or claim,and is tested for refutation by the study For eg- newer drug B is not better than existing drug A
for releiving toothache A null hypothesis is sought to be refuted by
conducting a study
A null hypothesis is either rejected or not rejected,it is
never accepted Alternate hypothesis is the assertion that is accepted
when the null is rejected Note that alternative is accepted when null is rejected
but nothing is accepted when null is not rejected
Evidance against null In case of medical studies,evidence is provided in
terms of the results of a trial conducted on some patients,or observations regarding natural occurences in a group or many group of people
The evidence is considered sufficient against the null
hypothesis if 1)study is unbiased 2)There are no confounders that can affect the findings 3)Sample size is sufficient to inspire confidence in results and sampling fluctuations are minimal
Type-1 error and p- values Type I error - when a true null hypothesis is rejected
due to the wrong evidence provided by the data This is serious error The probability of type-I error is called P-value
Thus, P value is the chance that the presence of
difference is concluded when actually there is none It is this type I error that later on forces ban on some
drugs after they are licensed for marketing
The maximum threshold of tolerance of the
probability of type-I error is called the significance level It is denoted by and is fixed in advance,generally at 0.05 percent P-value is calculated on basis of the data but is fixed
in advance
When P