INTRODUCTION TO STATISTICAL
ANALYSIS
Dr. N. SowmyaM.Sc, M.Phil, Ph.D,
Associate Professor & Head,
Department of Home Science,
Quaid-E-Millath Govt. College for Women,
Chennai-02
CONTENTS
Descriptive and inferential statistics
Types of Variables
Analysis of Data
Parametric and non - parametric tests
t-test, ANOVA, Correlation and linear regression
analysis
Demonstration of the above analyses using SPSS
STATISTICS-AN OVERVIEW
An area of Science concerned with extraction
of information from numerical data and its
use in making inferences about the
population from which it is obtained.
VARIABLESA characteristic that varies from one subject
to another or from one unit to another.
ANALYSIS OF DATA
POINTS TO REMEMBER
Standard deviation (S.D.) is a measure of variability
& explains how far each observation deviates from
the mean
In a normal distribution ,S.D should be <1/2 mean.
INFERENTIAL STATISTICS
Hypothesis testing - A way of organising & presenting evidence that helps
to reach a conclusion
PARAMETRIC TESTS
They assume certain properties of the population like normal
distribution, equal mean & variance etc.
Powerful statistical tests.
Important tests-t test, F test (ANOVA), Pearson correlation and linear
regression
When the dependent variable is continuous, parametric tests can be used.
NON-PARAMETRIC TESTS
Do not make any assumptions about the
population
Used when the distribution is not a normal
distribution (skewed)
When the dependent variable is categorical or
ordinal, non parametric tests can be used
Ex-Chi-square test,Mann-whitney U test,
Kruskal wallis test
t-TEST
Used to compare the means of groups
TYPES
• One sample test
• T-test for 2 independent samples (uncorrelated)
• T-test for paired samples (correlated)
• One sample t test-compare the means of a single group of observations
with a specified value.Ex. Compare the mean dietary intake with RDA
• Student’s t test- to compare means of 2 independent samples.
Independent variable-one nominal variable with 2 levels -
ex.boys/girls, smoking/non-smoking workers
Dependent variable-ex.marks obtained by the students , BMI/blood
Glucose etc
Assumptions
The 2 pairs should be independent.
The independent variable is categorical & contains only 2 levels.
It is a normal distribution.
VARIABLES FOR INDEPENDENT SAMPLE t - TEST
PAIRED t-TEST
Same individuals are studied more than once in
different circumstances.
Ex-measurements made on the same people before
and after intervention.
Condition- outcome variable should be
continuous, normal distribution.
ANOVA
ANOVA or F statistics are actually ratios of estimate of
variance.
Used to compare the means of more than 2 groups.
Examines the difference among groups.
Considers the variation across all groups at once
It is called ANOVA because although means are compared,
the comparisons are made using estimates of variance.
TYPE OF DATA REQUIRED
Independent variable- one nominal variable >2 levels Ex -
income level - Low/medium/high
Dependent variable- continuous variable Ex.height,weight
Assumptions
-The samples are random & independent of each other.
-The independent variable is categorical & contains more than
2 levels.
-Normal distribution
ANOVA separates the variation in all the data into 2
parts
The variation between each group mean & overall mean
for all the groups ie. Between group variability
The variation between each study participant &
participants group mean (the within group variability).
If the between group variability is much > than within
group variability) there are likely to be differences
between group means.
MEASURES OF RELATIONSHIP
CORRELATION
Bivariate data-where 2 variables are measured from each subject. Ex.
Height & Age
The relationship between such variables is called Correlation.
Represented as “r” & ranges from -1 to 1(Pearson’s Coefficient of
correlation)
REGRESSION
Describes the relation between the values of 2 variables.
Can predict the value of one variable using the value of the other variable.
CORRELATION
• Correlation analysis is used to determine if there is
a relationship between 2 variables - ex. Weight &
blood glucose levels and the strength of association
between them.
• Correlation analysis also determines the direction
of relationship - whether it is positive or negative
LINEAR REGRESSION
Used when the relationship between variables is linear
Simple linear regression-one independent & one
dependent variable - ex age and height.
Multiple linear regression-more than one independent
variable - ex Age, height and BMI on BP levels.
Describes the relation between 2 variables
Indicates the impact of the independent
variable
Predicts the value of one variable using the
value of the other variable for an individual.
ex. Given a value of age, corresponding
cholesterol levels can be predicted
TO SUM UP
• Statistics can be used to describe the
population characteristics and for
making inferences
• For normal distributions, parametric
tests like t- test and ANOVA are used
• Statistical methods like correlation
and regression are used to study
relationship between variables
• SPSS can be used effectively for all the
above analysis