Business
Research Methods
Introduction to Data
Analysis
Data Analysis Process
DATA ANALYSIS
DATA ENTRY
STAGES OF DATA ANALYSIS
ERROR
CHECKING
AND
VERIFICATION
CODING
EDITING
Introduction
Preparation of Data Editing, Handling Blank responses, Coding,
Categorization and Data Entry
These activities ensure accuracy of the data and
its conversion from raw form to reduced data
Exploring, Displaying and Examining
data Breaking down, inspecting and rearranging data
to start the search for meaningful descriptions,
patterns and relationship.
Editing
The Process Of Checking And Adjusting
The Data
For Omissions
For Legibility
For Consistency
And Readying Them For Coding And
Storage
Editing
IN-HOUSE
EDITING
FIELD EDITING
Reasons for Editing
Criteria
Consistent
Uniformly
entered
Arranged for
simplification
Complete
Accurate
Birth Year Recorded By Interviewer
1873?
1973 MORE LIKELY
Coding
Involves assigning numbers or other symbols to answers so the responses can be grouped into a limited number of classes or categories.
Example:
“M” for Male and “F” for Female
“1” for Male and “2” for Female
Numeric vs Alphanumeric
Numeric versus Alphanumeric
Open ended questions
Check accuracy by using 10% of responses
Coding Rules
Categories
should be
Appropriate to the
research problem Exhaustive
Mutually exclusive Derived from one
classification principle
Appropriateness
Let’s say your population is students at institutions of higher learning
What is you age group?
• 15 – 25 years
• 26 – 35 years
• 36 – 45 years
• Above 45 years
Exhaustiveness
What is your race?
• Malay
• Chinese
• Indians
• Others
Mutual Exclusivity
What is your occupation type?
• Professional • Crafts
• Managerial • Operatives
• Sales • Unemployed
• Clerical • Housewife
• Others
Single Dimension
What is your occupation type?
• Professional • Crafts
• Managerial • Operatives
• Sales • Unemployed
• Clerical • Housewife
• Others
Coding Open-ended Responses
Coding Open Ended Questions
Handling Blank Responses
How do we take care of missing
responses? If > 25% missing, throw out the questionnaire
Other ways of handling
• Use the midpoint of the scale
• Ignore (system missing)
• Mean of those responding
• Mean of the respondent
• Random number
Code Book
Identifies each variable
Provides a variable’s description
Identifies each code name and position
on storage medium
Sample SPSS Codebook
Data Entry
Database
Programs
Optical
Recognition
Digital/
Barcodes
Voice
recognition
Keyboarding
Data Transformation
Weights
Assigning numbers to responses on a
pre-determined rule
Respecification of the Variable
Transforming existing data to form new
variables or items
Recode
Compute
Scale Transformation
Reason for Transformation
to improve interpretation and compatibility with other data sets
to enhance symmetry and stabilize spread
improve linear relationship between the variables (Standardized score)
s
XXz i
-
Characteristics of Distributions
Summarizing
Distributions with Shape
Parameter & Statistics
Variable Population Sample Mean
µ
X
Proportion
p
Variance
2
s2
Standard deviation
s
Size
N
n
Standard error of the mean
x
Sx
Statistical Testing Procedures
Obtain critical
test value
Interpret the
test Stages
Choose
statistical test
State null
hypothesis
Select level of
significance Compute
difference
value
Hypotheses
Null
H0: = 50 mpg
H0: < 50 mpg
H0: > 50 mpg
Alternate
HA: 50 mpg
HA: > 50 mpg
HA: < 50 mpg
Accept/Reject
Accept/Reject
How to Select a Test
Two-Sample Tests ____________________________________________
k-Sample Tests ____________________________________________
Measurement
Scale One-Sample Case Related Samples
Independent
Samples Related Samples
Independent
Samples
Nominal Binomial
x2 one-sample test
McNemar Fisher exact test
x2 two-samples
test
Cochran Q x2 for k samples
Ordinal Kolmogorov-Smirnov
one-sample test
Runs test
Sign test
Wilcoxon
matched-pairs test
Median test
Mann-Whitney U
Kolmogorov-
Smirnov
Wald-Wolfowitz
Friedman two-
way ANOVA
Median
extension
Kruskal-Wallis
one-way ANOVA
Interval and
Ratio
t-test
Z test
t-test for paired
samples
t-test
Z test
Repeated-
measures ANOVA
One-way
ANOVA
n-way ANOVA
Research Model
Attitude
Intention to
Share
Information
Subjective
norm
5 items
4 items
3 items
Perceived
Behavioral
Control
4 items
Actual
Sharing of
Information
5 items
Reliability - Command
Reliability
Reliability Statistics
.977 5
Cronbach's
Alpha N of Items
Item-Total Statistics
15.25 6.681 .973 .965
15.26 6.560 .925 .972
15.24 6.906 .929 .972
15.21 6.825 .900 .975
15.25 6.555 .935 .970
Att1
Att2
Att3
Att4
Att5
Scale Mean ifItem Deleted
ScaleVariance if
Item Deleted
CorrectedItem-TotalCorrelation
Cronbach'sAlpha if Item
Deleted
Question:
How reliable are our instruments?
Reliability
Reliability Statistics
.912 4
Cronbach's
Alpha N of Items
Item-Total Statistics
11.20 4.243 .761 .900
11.03 4.135 .855 .868
11.00 4.021 .856 .867
11.21 4.250 .736 .909
Sn1
Sn2
Sn3
Sn4
Scale Mean ifItem Deleted
ScaleVariance if
Item Deleted
CorrectedItem-TotalCorrelation
Cronbach'sAlpha if Item
Deleted
Reliability
Reliability Statistics
.919 4
Cronbach's
Alpha N of Items
Item-Total Statistics
10.48 4.984 .814 .895
10.45 4.793 .826 .892
10.43 5.042 .809 .897
10.40 5.246 .814 .897
Pbc1
Pbc2
Pbc3
Pbc4
Scale Mean ifItem Deleted
ScaleVariance if
Item Deleted
CorrectedItem-TotalCorrelation
Cronbach'sAlpha if Item
Deleted
Reliability
Reliability Statistics
.966 5
Cronbach's
Alpha N of Items
Item-Total Statistics
15.28 6.591 .951 .951
15.28 6.612 .888 .961
15.29 6.553 .901 .959
15.28 6.716 .877 .962
15.24 6.445 .904 .958
Intent1
Intent2
Intent3
Intent4
Intent5
Scale Mean ifItem Deleted
ScaleVariance if
Item Deleted
CorrectedItem-TotalCorrelation
Cronbach'sAlpha if Item
Deleted
Table in Report
Variable N of Item Item
Deleted
Alpha
Attitude 5 - 0.977
SN 4 - 0.912
Pbcontrol 4 - 0.919
Intention 5 - 0.966
Actual 3 - 0.933
Example - Recoding
Perceived Enjoyment
PE1 The actual process of
using Instant Messenger is
pleasant
1 2 3 4 5 6 7
PE2 I have fun using Instant
Messenger 1 2 3 4 5 6 7
PE3 Using Instant Messenger
bores me 1 2 3 4 5 6 7
PE4 Using Instant Messenger
provides me with a lot of
enjoyment
1 2 3 4 5 6 7
PE5 I enjoy using Instant
Messenger 1 2 3 4 5 6 7
Recoding
Recoding
Data before Transformation
Data after Transformation
Frequencies - Command
Frequencies
Gender
144 75.0 75.0 75.0
48 25.0 25.0 100.0
192 100.0 100.0
Male
Female
Total
Valid
Frequency Percent Valid PercentCumulative
Percent
Current Position
34 17.7 17.7 17.7
66 34.4 34.4 52.1
54 28.1 28.1 80.2
32 16.7 16.7 96.9
6 3.1 3.1 100.0
192 100.0 100.0
Technician
Engineer
Sr Engineer
Manager
Above manager
Total
Valid
Frequency Percent Valid Percent
Cumulative
Percent
Question:
1. Is our sample representative?
2. Data entry error
Table in Report
Frequency Percentage
Gender
Male
Female
Position
Technician
Engineer
Sr Engineer
Manager
Above manager
144
48
34
66
54
32
6
75.0
25.0
17.7
34.4
28.1
16.7
3.1
Descriptives - Command
Descriptives
Descriptive Statistics
192 19 53 33.39 8.823 .667 .175 -.557 .349
192 1 18 5.36 4.435 1.448 .175 1.333 .349
192 1 28 9.04 7.276 1.051 .175 -.025 .349
192 2.00 5.00 3.8104 .64548 -.480 .175 .242 .349
192 2.00 5.00 3.7031 .67034 -.101 .175 .755 .349
192 2.00 5.00 3.4792 .73672 .015 .175 -.028 .349
192 2.00 5.00 3.8188 .63877 -.528 .175 .687 .349
192 2.33 5.00 4.0625 .58349 -.361 .175 -.328 .349
192
Age
Years working in the
organization
Total years of
working experience
Attitude
subjective
Pbcontrol
Intention
Actual
Valid N (listwise)
Statistic Statistic Statistic Statistic Statistic Statistic Std. Error Statistic Std. Error
N Minimum Maximum Mean Std.Deviation
Skewness Kurtosis
Question:
1. Is there variation in our data?
2. What is the level of the phenomenon we are measuring?
Table in Report
Mean
Std.
Deviation
Attitude 3.81 0.65
Subjective Norm 3.70 0.67
Behavioral Control 3.48 0.74
Intention 3.82 0.64
Actual 4.06 0.58
Chi Square Test - Command
Crosstabulation
Gender * Intention Level Crosstabulation
110 34 144
76.4% 23.6% 100.0%
70.5% 94.4% 75.0%
57.3% 17.7% 75.0%
46 2 48
95.8% 4.2% 100.0%
29.5% 5.6% 25.0%
24.0% 1.0% 25.0%
156 36 192
81.3% 18.8% 100.0%
100.0% 100.0% 100.0%
81.3% 18.8% 100.0%
Count
% within Gender
% within Intention Level
% of Total
Count
% within Gender
% within Intention Level
% of Total
Count
% within Gender
% within Intention Level
% of Total
Male
Female
Gender
Total
Low High
Intention Level
Total
Chi-Square Tests
8.934b 1 .003
7.704 1 .006
11.274 1 .001
.002 .001
8.888 1 .003
192
Pearson Chi-Square
Continuity Correctiona
Likelihood Ratio
Fisher's Exact Test
Linear-by-LinearAssociation
N of Valid Cases
Value dfAsymp. Sig.
(2-sided)Exact Sig.(2-sided)
Exact Sig.(1-sided)
Computed only for a 2x2 tablea.
0 cells (.0%) have expected count less than 5. The minimum expected count is 9.00.
b.
Question:
Is level of sharing dependent on gender?
T-test - Command
t-test
(2 Independent)
Group Statistics
144 3.9000 .60302 .05025
48 3.5750 .68619 .09904
Gender
Male
Female
Intention
N Mean
Std.
Deviation
Std. Error
Mean
Independent Samples Test
3.591 .060 3.122 190 .002 .32500 .10410 .11965 .53035
2.926 72.729 .005 .32500 .11106 .10364 .54636
Equal variancesassumed
Equal variancesnot assumed
Intention
F Sig.
Levene's Test for
Equality of Variances
t df Sig. (2-tailed)Mean
Dif ferenceStd. ErrorDif ference Lower Upper
95% Conf idenceInterval of the
Dif ference
t-test for Equality of Means
Question:
Does intention to share vary by gender?
Paired t-test - Command
t-test
(2 Dependent)
Paired Samples Statistics
3.8188 192 .63877 .04610
4.0625 192 .58349 .04211
Intention
Actual
Pair
1
Mean N
Std.
Deviation
Std. Error
Mean
Paired Samples Correlations
192 .817 .000Intention & ActualPair 1
N Correlation Sig.
Paired Samples Test
-.24375 .37326 .02694 -.29688 -.19062 -9.049 191 .000Intention - ActualPair 1
Mean
Std.
Deviation
Std. Error
Mean Lower Upper
95% Conf idence
Interval of the
Dif ference
Paired Differences
t df Sig. (2-tailed)
Question:
Are there differences between intention to
share and actual sharing behavior?
One Way ANOVA - Command
One way ANOVA
(k independent)
ANOVA
Intention
7.864 4 1.966 5.247 .001
70.068 187 .375
77.933 191
Between Groups
Within Groups
Total
Sum ofSquares df Mean Square F Sig.
Intention
Duncana,b
66 3.6424
32 3.6625
34 3.8941
54 4.0000
6 4.5333
.101 1.000
Current Position
Engineer
Manager
Technician
Sr Engineer
Above manager
Sig.
N 1 2
Subset for alpha = .05
Means for groups in homogeneous subsets are displayed.
Uses Harmonic Mean Sample Size = 19.157.a.
The group sizes are unequal. The harmonic meanof the group sizes is used. Type I error levels are
not guaranteed.
b.
Question:
Does intention vary by position?
Correlation - Command
Correlation
(Interval/ratio)
Question:
Are the variables related?
Correlations
1 .697** .212** .808** .606**
.000 .003 .000 .000
192 192 192 192 192
.697** 1 -.052 .653** .552**
.000 .471 .000 .000
192 192 192 192 192
.212** -.052 1 .281** .031
.003 .471 .000 .665
192 192 192 192 192
.808** .653** .281** 1 .817**
.000 .000 .000 .000
192 192 192 192 192
.606** .552** .031 .817** 1
.000 .000 .665 .000
192 192 192 192 192
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Attitude
subjective
Pbcontrol
Intention
Actual
Attitude subjective Pbcontrol Intention Actual
Correlation is significant at the 0.01 level (2-tailed).**.
Table Presentation
Attitude subjective Pbcontrol Intention Actual
Attitude 1
subjective .740** 1
Pbcontrol .201** -.047 1
Intention .885** .662** .326** 1
Actual .660** .553** .059 .805** 1
*p< 0.05, **p< 0.01
Command
Multiple
Regression
Question:
Which variables can explain the intention to
share?
Variables Entered/Removedb
Pbcontrol,
subjective,
Attitudea
. Enter
Model
1
Variables
Entered
Variables
Removed Method
All requested variables entered.a.
Dependent Variable: Intentionb.
Model Summaryb
.832a .693 .688 .35703 1.501
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
Durbin-
Watson
Predictors: (Constant), Pbcontrol, subjective, Attitudea.
Dependent Variable: Intentionb.
Multiple Regression
ANOVAb
53.968 3 17.989 141.127 .000a
23.964 188 .127
77.933 191
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), Pbcontrol, subjective, Attitudea.
Dependent Variable: Intentionb.
Coefficientsa
.191 .197 .971 .333
.601 .059 .607 10.103 .000 .453 2.210
.227 .056 .238 4.043 .000 .472 2.116
.143 .037 .165 3.821 .000 .877 1.140
(Constant)
Attitude
subjective
Pbcontrol
Model
1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Tolerance VIF
Collinearity Statistics
Dependent Variable: Intentiona.
Assumptions (Multicollinearity)
Collinearity Diagnosticsa
3.936 1.000 .00 .00 .00 .00
.043 9.581 .00 .02 .10 .55
.013 17.195 .91 .19 .02 .21
.008 22.890 .09 .79 .88 .24
Dimension
1
2
3
4
Model
1
EigenvalueCondition
Index (Constant) Attitude subjective Pbcontrol
Variance Proportions
Dependent Variable: Intentiona.
Assumptions (Outliers)
Casewise Diagnosticsa
3.152 5.00 3.8748 1.12520
4.042 5.00 3.5570 1.44295
3.071 4.20 3.1037 1.09631
3.152 5.00 3.8748 1.12520
4.042 5.00 3.5570 1.44295
3.071 4.20 3.1037 1.09631
Case Number
70
82
83
166
178
179
Std. Residual Intention
Predicted
Value Residual
Dependent Variable: Intentiona.
After Removing Outliers
Model Summaryb
.900a .810 .807 .27373 1.725
Model
1
R R Square
Adjusted
R Square
Std. Error of
the Estimate
Durbin-
Watson
Predictors: (Constant), Pbcontrol, subjective, Attitudea.
Dependent Variable: Intentionb.
Coefficientsa
.067 .153 .441 .659
.758 .050 .784 15.281 .000 .396 2.523
.085 .047 .091 1.801 .073 .412 2.426
.145 .029 .173 5.015 .000 .875 1.143
(Constant)
Attitude
subjective
Pbcontrol
Model
1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig. Tolerance VIF
Collinearity Statistics
Dependent Variable: Intentiona.
ANOVAb
58.261 3 19.420 259.182 .000a
13.637 182 .075
71.898 185
Regression
Residual
Total
Model
1
Sum of
Squares df Mean Square F Sig.
Predictors: (Constant), Pbcontrol, subjective, Attitudea.
Dependent Variable: Intentionb.
Assumptions – Advanced Diagnostics
(Hair et al., 2006)
Residuals Statisticsa
2.1329 4.9380 3.8188 .53156 192
-3.172 2.106 .000 1.000 192
.027 .111 .048 .020 192
2.1423 4.9493 3.8179 .53167 192
-.96087 1.44295 .00000 .35421 192
-2.691 4.042 .000 .992 192
-2.731 4.253 .001 1.012 192
-.98909 1.59761 .00086 .36911 192
-2.779 4.461 .004 1.031 192
.130 17.495 2.984 3.453 192
.000 .485 .011 .051 192
.001 .092 .016 .018 192
Predicted Value
Std. Predicted Value
Standard Error of
Predicted Value
Adjusted Predicted Value
Residual
Std. Residual
Stud. Residual
Deleted Residual
Stud. Deleted Residual
Mahal. Distance
Cook's Distance
Centered Leverage
Value
Minimum Maximum MeanStd.
Deviation N
Dependent Variable: Intentiona.
Assumptions (Normality)
6420-2-4
Regression Standardized Residual
70
60
50
40
30
20
10
0
Freq
uenc
y
Mean = -1.99E-17Std. Dev. = 0.992N = 192
Dependent Variable: Intention
Histogram
Assumptions
(Normality of the Error term)
1.00.80.60.40.20.0
Observed Cum Prob
1.0
0.8
0.6
0.4
0.2
0.0
Exp
ecte
d C
um P
rob
Dependent Variable: Intention
Normal P-P Plot of Regression Standardized Residual
Assumptions (Constant Variance)
5.004.504.003.503.002.502.00
Intention
4
2
0
-2Reg
ress
ion
Stu
dent
ized
Res
idua
l
Dependent Variable: Intention
Scatterplot
Assumptions (Linearity)
10-1-2
Attitude
1.5
1.0
0.5
0.0
-0.5
-1.0
-1.5
Inte
nti
on
Dependent Variable: Intention
Partial Regression Plot
Assumptions (Linearity)
210-1-2
subjective
2.0
1.5
1.0
0.5
0.0
-0.5
-1.0
Inte
nti
on
Dependent Variable: Intention
Partial Regression Plot
Assumptions (Linearity)
10-1-2
Pbcontrol
2.0
1.5
1.0
0.5
0.0
-0.5
-1.0
Inte
ntio
n
Dependent Variable: Intention
Partial Regression Plot
Table Presentation
Variable Dependent = Intention
Standardized Beta
Attitude
Subjective Norm
Perceived Control
0.607**
0.238**
0.105**
R2
Adjusted R2
F Value
D-W
0.693
0.688
141.13
1.501
*p< 0.05, **p< 0.01