7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
1/41
Statistical Techniques for
Analyzing Quantitative Data
Maryam RamezaniValues in Computer Technology
CSC 426
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
2/41
Outline
Statistics in Research
Exploring and Organizing a Data Set
Nature of the Data , Nominal , Ordinal, Interval, Ratio
Normal and Non-Normal Distributions
Descriptive Statistics
Inferential Statistics
Statistical Software Packages
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
3/41
Role of Statistics in Research
With Statistics , we can summarize large bodies
of data, make predictions about future trends
,and determine when different experimentaltreatments have led to significantly different
outcomes.
Statistics are among the most powerful tools in
the research's toolbox.
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
4/41
How statistics come to research?
In quantitative research we use numbers to
represent physical or nonphysical
phenomena
We use statistics to summarize and interpret
numbers
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
5/41
Exploring and Organizing a Data Set
Look at your data and find the ways of organizingthem
example: Scores of test for 11 children:
What do you see?
Ruth 96, Robert 60, chuck 68, Margaret 88
Tom 56, Mary 92,Ralph 64, Bill 72,Alice 80
Adam 76,Kathy 84
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
6/41
Exploring and Organizing a Data Set
Student Score
Ruth 96
Robert 60
Chuck 68
Margaret 88
Tom 56
Mary 92
Ralph 64
Bill 72
Alice 80Adam 76
Kathy 84
Student Score
Adam 76
Alice 80
Bill 72
Chuck 68
Kathy 84
Margaret 88
Mary 92
Ralph 64
Robert 60
Ruth 96
Tom 56
Student Score
Alice 80
Kathy 84
Margaret 88
Mary 92
Ruth 96
Alphabetical
Order
Adam 76
Bill 72
Chuck 68
Ralph 64
Robert 60
Tom 56
0
20
40
60
80
100
120
0 5 10 15
Series1
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
7/41
Using Computer Spreadsheets to Organize
and Analyze Data
Sorting
Graphing
Formulas What Ifs
Save, Store, recall, update information
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
8/41
Functions of Statistics
Descriptive Statistics:
describes what the data look like
Inferential Statistics :
inference about a large population by collecting
small samples.
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
9/41
Considering the Nature of the Data
Continuous or discrete
Nominal, ordinal, interval or ratio scale
Normal or non-normal distribution
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
10/41
Continuous versus Discrete Variables
Continuous Data :takes on any value within a finite or infinite interval.You can count, order and measure continuous data.
Example :height, weight, temperature, the amount of sugar in an orange, the
time required to run a mile.
Discrete Data : values / observations belong are distinct and separate,i.e. they can be counted (1,2,3,....).
Example: the number of kittens in a litter; the number of patients in a doctors
surgery; the number of flaws in one metre of cloth; gender (male, female); blood
group (O, A, B, AB).
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
11/41
Nominal Data
the numbers are simply labels. You can count but not order or
measure nominal data
Example: males could be coded as 0, females as 1; marital status of anindividual could be coded as Y if married, N if single.
classification data, e.g. m/f
no ordering, e.g. it makes no sense to state that M > F
arbitrary labels, e.g., m/f, 0/1, etc
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
12/41
Ordinal Data
ordered but differences between values are not
important e.g., Like scales, rank on a scale of 1..5 your degree of satisfaction
rating of 2 rather than 1 might be much less than the difference inenjoyment expressed by giving a rating of 4 rather than 3.
You can count and order, but not measure, ordinal
data.
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
13/41
Interval Data
ordered, constant scale, but no natural zero
differences make sense, but ratios do not
e.g.: 30-20=20-10, but 20/10 is not twice as hot!
e.g.: Dates: the time interval between the starts of years 1981 and 1982
is the same as that between 1983 and 1984, namely 365 days. The
zero point, year 1 AD, is arbitrary; time did not begin then
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
14/41
Ratio Data
Like interval data but has true zero
Ordered, Constant scale, natural zero
e.g., height, weight, age, length
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
15/41
Normal and Non-Normal Distributions
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
16/41
Normal Distribution
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
17/41
Non-Normal Distributions
Skewed to the Left
(Negatively Skewed)Skewed to the Right
(Positevely Skewed)
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
18/41
Leptokurtic and Platykurtic
Distributions
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
19/41
Descriptive Statistics
Descriptive Statistics describes data
Points of Central Tendency
Amount of Variability
Relation of different variables to eachother
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
20/41
Measuring center: If the n observations are x1, x2,,
xn, arithmetic mean is
n
xxx
xn
21
Points Of Central Tendency: Mean
Geometric Mean
e.x.: Biological growth, Population growth
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
21/41
Measure of Central Tendency
Mode The Most frequently occurringscore is identified.
Data on nominal, ordinal,
interval and ratio
Median The midpoint of the data Ordinal, interval, and ratio
Arithmetic
mean
All scores are added and the sum
is divided by the number of scores
Interval and ration
Geometric
mean
All scores are multiplied together,
and the nth root of their product is
computed.
Ratio scales
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
22/41
Measures of Variability
How great is the Spread?
Range=Highest Score-Lowest score
the quartiles: The pth percentile of a distribution is the value
such that p percent of the observations fall at or below it.
The 50th percentile = median, M
The 25th percentile = first quartile, Q1
The 75th percentile = third quartile, Q3
Interquartile: Quartile 3- Quartile 1
Example:
13 13 16 19 21 21 23 23 24 26 26 27 27 27 28 28 30 30
M=?, Q1=?, Q3=?
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
23/41
Measures of Variability
2
11
1
n
i
i xx
n
s
Standard Devastation
standardized score
xz
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
24/41
Measure of Relationship: Correlation
correlation indicates the strength and direction of a linearrelationship between two variables.
See page 266 for other examples or correlation statistics
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
25/41
Notes about Correlation
Substantial correlations between two
characteristics needs reasonable Validity and
Reliability in measuring
Correlation does not indicate causation
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
26/41
Examples of using Statistics in
Computer Science Conceptual Representation of User Transactions or Sessions
A B C D E F
user0 15 5 0 0 0 185
user1 0 0 32 4 0 0
user2 12 0 0 56 236 0
user3 9 47 0 0 0 134
user4 0 0 23 15 0 0user5 17 0 0 157 69 0
user6 24 89 0 0 0 354
user7 0 0 78 27 0 0
user8 7 0 45 20 127 0
user9 0 38 57 0 0 15
Session/user
data
Pageview/objects
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
27/41
Inferential Statistics
We use the samples as estimate of population parameter.
The quality of all statistical analysis depends on the quality of
the sample data
Sample
Population
Random Sampling: every unit in the
population has an equal chance to be
Chosen
A random sample should represent thepopulation well, so sample statistics
from a random sample should provide
reasonable estimates of population parameters
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
28/41
Some definitions
Parameter: describes a population
Statistic: describes a sample
Sample statistics Population parameter
Sample mean x
Sample proportion p P
Sample variance s2 2
Sample number n N
A parameter is a characteristic or quality of a population that in concept is
constant ,however, its value is variable.
example: radius is a parameter in a circle
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
29/41
Inferential Statistics
Estimate a population parameter from a
random sample
Test statistically hypotheses
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
30/41
Inferential Statistics: Estimate a
Population Parameter from Sample
All sample statistics have some error in estimating population parameters
Example: estimate mean height of 10 year old boys in Chicago, Sample:200 boys
How close the sample mean is to the population mean?
we dont know but we know:
The mean from an infinite number of samples form a normal distribution.
The population mean equals the average (mean) of all samples.
The Standard deviation of sample distribution ( standard error) is directly
related to the std of the characteristic in question for the overallpopulation.
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
31/41
Standard Error
Standard error tell us how much the particular mean vary from one
sample to another when all samples are the same size and drawn
randomly from the sample population.
Standard Error:
n is size of all samples and is the population std which we dont have!
We use the std of sample:
nM
1
n
s
M
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
32/41
Accuracy of the Estimator
As in many problems, there
is a trade off between
accuracy and dollars.
What we will get from
our money if we invest
dollars in obtaining a largersize?
n = 100?
n = 200?
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
33/41
Point versus Interval Estimate
A point estimate is a single value--a point--taken from a sample and used to estimate thecorresponding parameter of a population
, s, s2 and r estimate , , 2, respectively
An interval estimate is a range of values--an interval within whose limits apopulation parameter probably lies.
we say that we are 95% confident that the unknown population mean lies in the interval
X
(x -2/(n1/2),x+2 /(n1/2))
95% confidence interval for .
In only 5% of all samples,
the sample mean x is not in the above interval,
that is 5% of all samples give inaccurate results.
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
34/41
Testing Hypothesis
Confidence intervals are used when the goal of our analysis is toestimate an unknown parameter in the population.
A second goal of a statistical analysis is to verify some claim aboutthe population on the basis of the data.
Research Hypothesis =/=Statistical hypothesis
A test of significance is a procedure to assess the truth about ahypothesis using the observed data. The results of the test areexpressed in terms of a probability that measures how well the datasupport the hypothesis.
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
35/41
Sample values: The sample average of nicotine = 1.51 mlg
The standard deviation = 1.016.
The estimated amount of nicotine is 1.51mlg, based on the sample values.
The standard error of the sample average is
S.E.=s.d./sqrt(n-1)=0.045
Is there an actual difference between the sample value (1.51mlg) and theadvertised value (1.4 mlg)? Or is it just due to sampling error?
To answer this question we need aTest of Significance:
Example
To determine whether the mean nicotine content of a brand of cigarettes is
greater than the advertised value of 1.4 milligrams, a health advocacy grouptakes a sample of 500 cigarettes and measures the amount of nicotine in the
sample.
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
36/41
Stating an hypotheses
The null hypothesis H0 expresses the idea that the observed difference is
due to chance. It is a statement of no effect or no difference, and is
expressed in terms of the population parameter.
Let denote the true average amount of nicotine.H0 : =1.4mlg
The alternative hypothesisHarepresents the idea that the difference is real. It
is expressed as the statement we hope or suspect is true instead of the null
hypothesis.
The alternative hypothesis states that the cigarettes contain a higher
amount of nicotine, that is: Ha : > 14mlg
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
37/41
General comments on stating hypotheses
It is not easy to state the null and the alternative hypothesis!
The hypotheses are statements on the population values.
The alternative hypothesis Ha is often called researcher hypothesis,
because it is the hypothesis we are interested about.
A significance test is a test against the null hypothesis
Often we set Ha first and then Ho is defined as the opposite
statement!
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
38/41
Errors in Hypothesis testing
Type I Error: the null hypothesis is rejected when it is in fact true; that is,
H0 is wrongly rejected.
Type II Error:the null hypothesis H0, is not rejected when it is in fact false
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
39/41
Meta- Analysis
Meta-analysis refers to the analysis of analyses...the statistical
analysis of a large collection of analysis results from individual
studies for the purpose of integrating the findings. (Glass, 1976, p. 3)
Conduct a fairly extensive search for relevant studies
Identify appropriate studies to include in meta-analysis
Convert each studys results to a common statistical index
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
40/41
Using Statistical Software Packages
SPSS
SAS
Matlab Statistics toolbox
SYSTAT, Minitab, Stat View, Statistica
7/29/2019 92465757 Statistical Techniques for Analyzing Quantitative Data
41/41
Interpreting the Data
Relating the findings to the original research problem and to the
specific research questions and hypothesis
Relating the findings to preexisting literature, concepts, theories and
research results.
Determining whether the findings have practical significance as well
as statistical significance
Identifying limitations of the study