1
11
psyc3010 lecture 13psyc3010 lecture 13This week:This week:
(1)(1)PSYC3010 PSYC3010 -- overviewoverview(2)(2)SECATsSECATs(3)(3)Discussion of exam and distribution of practice examDiscussion of exam and distribution of practice exam(4)(4)A bit on logistic regression A bit on logistic regression (5)(5) Interconnections between ANOVA and regressionInterconnections between ANOVA and regressionHowell Howell chch 16 p. 60416 p. 604--617617
last week: mixed anova
Exam consultExam consultPlease post and answer ?s in the Please post and answer ?s in the discussion forum (which I will also monitor discussion forum (which I will also monitor periodically)periodically)Consult times for me for the exam will be:Consult times for me for the exam will be:Monday 20 June 4Monday 20 June 4--5pm 5pm Friday 17 June 8Friday 17 June 8--10am10amMonday 13 June 1Monday 13 June 1--3pm3pmMonday 6 June 3Monday 6 June 3--4pm4pmOr by appointmentOr by appointment
22
2
33
(1) what have we “added” (1) what have we “added” in 3010 from 2010?in 3010 from 2010?
44
course overviewcourse overview
PSYC2010PSYC2010designs involving one factor or one designs involving one factor or one predictorpredictor
PSYC3010PSYC3010designs involving designs involving multiplemultiple factors, factors, predictors, or categorical variablespredictors, or categorical variables
3
55
PSYC2010PSYC2010oneone--way betweenway between--subjects ANOVAsubjects ANOVAoneone--way withinway within--subjects ANOVAsubjects ANOVA
PSYC3010PSYC3010factorial ANOVAfactorial ANOVAbetweenbetween--subjectssubjects–– 22--wayway–– 33--wayway
withinwithin--subjectssubjectsmixedmixedblocking (and ANCOVA)blocking (and ANCOVA)
66
PSYC2010PSYC2010bivariatebivariate correlation and regressioncorrelation and regression
PSYC3010PSYC3010multiple regressionmultiple regression
standardstandardhierarchicalhierarchical–– as control techniqueas control technique–– assessing mediationassessing mediation–– assessing moderationassessing moderation
4
77
PSYC3010 learning objectives1. Generate research designs for questions involving multiple IVs /
predictors, based on methodological and practical considerations.2. Identify the statistical analyses that are appropriate for research
designs involving multiple IVs / predictors.3. Identify the key terms and conceptual principles relevant to
statistical techniques involving multiple IVs / predictors.
4. Plan and execute (omnibus and follow-up) tests in statistical analyses involving multiple IVs / predictors.
5. Interpret results from these statistical analyses, identifying the implications of the results for hypotheses and research questions.
6. Report and discuss the results of these analyses, following standard conventions in Psychology.
7. Use your statistics knowledge to develop and enrich your work as a psychologist.
88
The purpose of statisticsThe purpose of statisticsTo understand the shape of the dataTo understand the shape of the dataTo To understand understand meaningfulmeaningful questionsquestionsand and assess assess meaninglessmeaningless rantingranting–– “Women mature faster than men” “Men are “Women mature faster than men” “Men are
stronger”stronger”•• What’s the standard deviation? Is the difference What’s the standard deviation? Is the difference
reliable reliable -- Is it even going to be significant in the Is it even going to be significant in the population?population?
•• What’s the effect size? What portion of the What’s the effect size? What portion of the variance in the data does gender account for?variance in the data does gender account for?
•• What are other factors associated with gender to What are other factors associated with gender to control for (e.g., via ANCOVA)? [ANOVA is not control for (e.g., via ANCOVA)? [ANOVA is not causation!]causation!]
•• What other factors might moderate this effect? What other factors might moderate this effect? (interactions!)(interactions!)
5
99
The purpose of statisticsThe purpose of statisticsMeaningful ?s and meaningless rantingMeaningful ?s and meaningless ranting–– The wealthier you are, the happier you are!The wealthier you are, the happier you are!
•• Is that relationship reliable, is it significant in the Is that relationship reliable, is it significant in the population?population?
•• What is the effect size?What is the effect size?•• What other factors might need to be controlled for? What other factors might need to be controlled for?
[Correlation is not causation!][Correlation is not causation!]•• What other factors might moderate this effect? What other factors might moderate this effect?
(interactions!)(interactions!)•• Is this really a linear effect ?Is this really a linear effect ?
To read psych articles, need to know statistics To read psych articles, need to know statistics ––now you can read most & understand themnow you can read most & understand themmore broadly, it’s difficult to understand human more broadly, it’s difficult to understand human variability meaningfully without understanding variability meaningfully without understanding what variability and differences are and are not.what variability and differences are and are not.
1010
when do you use which when do you use which analysis?analysis?
Need to consider the Need to consider the typetype of variables: of variables: independent (predictor) and dependent (criterion).independent (predictor) and dependent (criterion).
Predictors
Categorical
Categorical & Continuous
Continuous
Continuous
Categorical
Criterion
Continuous
Continuous
Continuous
Categorical
Categorical
Method
ANOVA; MR
MR
MR
Logistic Regression
Log-linear Analysis
6
1111
The multivariate universe:The multivariate universe:Before 3010:Before 3010:–– Single explanationsSingle explanations–– Barely grasp difference Barely grasp difference
between correlations and between correlations and group differencesgroup differences
–– Tendency to rely too much Tendency to rely too much on pon p--valuesvalues
After 3010:After 3010:–– Multiple explanationsMultiple explanations–– Explanations that interact, Explanations that interact,
or are interor are inter--relatedrelated–– Variables considered jointly Variables considered jointly
so you can see interactions so you can see interactions and interand inter--relationships relationships explain more than explain more than considering each aloneconsidering each alone
–– Strong understanding of Strong understanding of correlations and group correlations and group differencesdifferences
–– Understanding key idea of Understanding key idea of effect sizeseffect sizes
1212
(2) SETCs(2) SETCs
Knowing artists, you think you know all about Prima Donnas:boy!, just wait till you hear scientists get up and sing.
-- W. H. Auden
7
1313
(3) exam review:content, structure, & study tips
1414
content of exam primarily assesses primarily assesses conceptualconceptual material from material from lectureslecturesmoving between research questions, design, moving between research questions, design, hypotheses, and analytic choiceshypotheses, and analytic choicespartitioning variance (systematic and error)partitioning variance (systematic and error)structuralstructural modelsmodels ((XXijij = )= )steps in analysis (omnibus and followsteps in analysis (omnibus and follow--up tests)up tests)key terms and principles in analysis key terms and principles in analysis interpretation of statisticsinterpretation of statistics–– description of results (e.g., description of results (e.g., FF and and pp values provided)values provided)–– no SPSS outputno SPSS output
calculating degrees of freedomcalculating degrees of freedom
8
1515
structure of exam 50 multiple-choice questions– 1 mark each– content spread across Lectures 1 12
10 mins perusal + 2 hours working time
formula sheet included formula sheet included –– does not include DF calculationsdoes not include DF calculations–– Posted on Blackboard nowPosted on Blackboard now
1616
study resources on Blackboardrecordings from all lecturesslides from all lecturesreview notes for all lectures– key concepts and principles that you should know
from each lecture
practice exam questions
tips for answering multiple-choice questions on a closed-book test
“Practice Materials”
“Lecture Materials”
9
1717
how to study for the exam I revise lecture content (strategically )
go over lecture notes and listen to lecture recordingsuse the Review Notes to work out which principles and concepts you must understand and memorise– dot-points in Review Notes are listed in the same
order as the concepts and principles in the lecture
tutorial notes / textbook readings may clarify things but if you understand everything in the lecture
slides, don’t worry about the tutorial / textbook content
1818
how to study for the exam II be prepared for the exam questionsbe prepared for the exam questions
the exam questions will ask you to apply your knowledge from the lectures
it is very important to complete the practice questions– PDFs for ANOVA and MR and ANCOVA– PDF for practice exam– Practice quizzes reopened online – why not keep going until
you get them all right?
it may also be useful to look at the tips for multiple-choice questions on a closed-book exam
10
1919
important logistical details what you are allowed to bring to the exam:what you are allowed to bring to the exam:–– nonnon--programmable calculator (be aware of need to programmable calculator (be aware of need to
have approved model / sticker)have approved model / sticker)–– Unmarked nonUnmarked non--electronic dictionary (you know, a book)electronic dictionary (you know, a book)
check with UQ Central Examinations for check with UQ Central Examinations for list of things you are list of things you are notnot allowed to bring inallowed to bring in
doubledouble--check the exam date / time / venuecheck the exam date / time / venue
arrive at least 15 minutes before the examarrive at least 15 minutes before the exam
be sure to have your ID card at the exambe sure to have your ID card at the exam
2020
Practice examPractice exam
More practice MC questionsMore practice MC questionsAnswers may be discussed in the Answers may be discussed in the PSYC3010 forumPSYC3010 forumYou are also welcome to attend You are also welcome to attend Winnifred’sWinnifred’s consult (times listed last slide)consult (times listed last slide)
11
2121
(4) A bit on logistic regression(4) A bit on logistic regression
2222
0123456789
10
0 1 2 3 4 5 6 7 8 9 10
Number of Social Events Attended
Life
Sat
isfa
ctio
n
Multiple regression = continuous IVs and DVs, Multiple regression = continuous IVs and DVs, each normally distributed. We fit the data with a each normally distributed. We fit the data with a
linear model linear model –– the straight line minimising the the straight line minimising the discrepancy between Y and Y hatdiscrepancy between Y and Y hat
12
2323
Logistic regression = continuous IVs and categorical (0, 1) Logistic regression = continuous IVs and categorical (0, 1) DV. Obviously (a) Y is not normally distributed and (b) a DV. Obviously (a) Y is not normally distributed and (b) a
straight line fits this data poorly.straight line fits this data poorly.
00.10.20.30.40.50.60.70.80.9
1
0 1 2 3 4 5 6 7 8 9 10
Number of Social Events Attended
Mor
talit
y w
ithin
5 y
ears
(1
=dea
d)
2424
Accordingly we fit the data with a logistic model Accordingly we fit the data with a logistic model –– the Sthe S--shape curve (a.k.a., sigmoidal curve) that best predicts shape curve (a.k.a., sigmoidal curve) that best predicts whether an observation will be in one group (0) versus whether an observation will be in one group (0) versus another (1). another (1).
00.10.20.30.40.50.60.70.80.9
1
0 1 2 3 4 5 6 7 8 9 10
Number of Social Events Attended
Mor
talit
y w
ithin
5 y
ears
(1=d
ead)
13
2525
Conceptual similarities:Conceptual similarities:Interpreting logistic R2 and R2 changeInterpreting logistic R2 and R2 change
In SPSS for logistic regression, you get R2 estimates In SPSS for logistic regression, you get R2 estimates labelled Cox & Snell R2 and Nagelkerke R2 labelled Cox & Snell R2 and Nagelkerke R2 –– These are two ways of understanding the “variance” in These are two ways of understanding the “variance” in
dichotomous (0, 1) DVsdichotomous (0, 1) DVs–– No convention exists regarding which to report No convention exists regarding which to report -- C&S is the more C&S is the more
conservative one, and Nagelkerke is more liberal conservative one, and Nagelkerke is more liberal –– at the at the moment Nagelkerke R2 is more common.moment Nagelkerke R2 is more common.
Hierarchical logistic regression can be performedHierarchical logistic regression can be performed–– SPSS will output C&S and N R2 for each model but you need to SPSS will output C&S and N R2 for each model but you need to
subtract the later R2 from earlier to get R2 change / blocksubtract the later R2 from earlier to get R2 change / block
R2 and R2 change are tested with chiR2 and R2 change are tested with chi--square ( square ( χχ22 ) ) tests, tests, not Fnot F--tests, but structure of writetests, but structure of write--up = identicalup = identicalBoth X2 for model and for block are reported, Both X2 for model and for block are reported, R2 change must be calculated by hand from the output.R2 change must be calculated by hand from the output.
2626
E.g. output and E.g. output and writewrite--upup
Logistic Regression Block 0: Beginning Block
Variables in the Equation
-.235 .143 2.673 1 .102 .791ConstantStep 0B S.E. Wald df Sig. Exp(B)
Block 1: Method = Enter
Omnibus Tests of Model Coefficients
.856 2 .652
.856 2 .652
.856 2 .652
StepBlockModel
Step 1Chi-square df Sig.
Model Summary
269.553a .004 .006Step1
-2 Loglikelihood
Cox & SnellR Square
NagelkerkeR Square
Estimation terminated at iteration number 3 becauseparameter estimates changed by less than .001.
a.
Variables in the Equation
-.026 .034 .613 1 .434 .974-.171 .313 .298 1 .585 .843-.118 .260 .204 1 .651 .889
c_ageec_women(1)Constant
Step1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: c_age, ec_women.a.
“A hierarchical logistic regression was conducted “A hierarchical logistic regression was conducted predicting whether or not participants took predicting whether or not participants took political action from demographic factors political action from demographic factors (Block 1) and attitude strength (Block 2). (Block 1) and attitude strength (Block 2). Table 1 describes the means, standard Table 1 describes the means, standard deviations, and intercorrelations. The entry of deviations, and intercorrelations. The entry of the demographics did not increase the the demographics did not increase the variance accounted for, Nagelkerke variance accounted for, Nagelkerke RR22 = .01, = .01, XX22 (2) = 0.86, (2) = 0.86, pp = .652 [snip]= .652 [snip]
14
2727
Block 2: Method = Enter
Omnibus Tests of Model Coefficients
14.475 1 .00014.475 1 .00015.331 3 .002
StepBlockModel
Step 1Chi-square df Sig.
Model Summary
255.078a .075 .100Step1
-2 Loglikelihood
Cox & SnellR Square
NagelkerkeR Square
Estimation terminated at iteration number 4 becauseparameter estimates changed by less than .001.
a.
Variables in the Equation
-.045 .035 1.622 1 .203 .956-.054 .327 .028 1 .868 .947.404 .110 13.439 1 .000 1.498
-1.073 .379 8.015 1 .005 .342
c_ageec_women(1)atstr_scConstant
Step1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: atstr_sc.a.
“However, the entry of attitude strength in “However, the entry of attitude strength in Block 2, significantly increased the variance Block 2, significantly increased the variance accounted for, Nagelkerke accounted for, Nagelkerke RR22 change = .09, change = .09, XX2 2 (1) = 14.48, (1) = 14.48, pp < .001. [snip] The final < .001. [snip] The final model accounted for only 10% of the model accounted for only 10% of the variance in action however, Xvariance in action however, X22 (3) = 15.33, (3) = 15.33, pp = .002.= .002.
E.g. output E.g. output and writeand write--upup
Note: the the difference between 2LL in this model (255.078) and the first model (269.553) equals the chi-square value (14.475). Some reviewers prefer reporting 2LL over R2
2828
15
2929
Return to the data Return to the data
00.10.20.30.40.50.60.70.80.9
1
0 1 2 3 4 5 6 7 8 9 10
Number of Social Events Attended
Mor
talit
y w
ithin
5 y
ears
(1=d
ead)
3030
Interpreting logistic coefficientsInterpreting logistic coefficientsError = still deviations from the (sError = still deviations from the (s--shaped) line but now shaped) line but now involve misclassification (e.g., predicted dead when is in involve misclassification (e.g., predicted dead when is in fact alive) fact alive) –– instead of being normally distributed, errors instead of being normally distributed, errors also trend towards 0,1 distributionalso trend towards 0,1 distributionInstead of describing and reporting unstandardised Instead of describing and reporting unstandardised coefficients, report Exp(B). This coefficient is tested with a coefficients, report Exp(B). This coefficient is tested with a Wald test not a tWald test not a t--test, but structure of writetest, but structure of write--up is same.up is same.Exp(B) coefficients don’t describe the 1 unit change in DV Exp(B) coefficients don’t describe the 1 unit change in DV given 1 unit change in IV given 1 unit change in IV –– they describe change in odds of they describe change in odds of being (1) compared to (0) for every unit increase in IVbeing (1) compared to (0) for every unit increase in IV–– Exp(B) = 1.00 Exp(B) = 1.00 –– no change in likelihood of dead within 5 years for no change in likelihood of dead within 5 years for
every 1 more social eventsevery 1 more social events–– Exp(B) = 2.50 Exp(B) = 2.50 –– likelihood of being dead within 5 years increases by likelihood of being dead within 5 years increases by
2.5 times (or increases by 250%) for every 1 more social events 2.5 times (or increases by 250%) for every 1 more social events attendedattended
–– Exp(B) = .80 Exp(B) = .80 –– likelihood of death within 5 years increases by .8 likelihood of death within 5 years increases by .8 times (but much more useful to say decreases by 20% [1times (but much more useful to say decreases by 20% [1--.8 = .2]) .8 = .2]) for every 1 more social events attendedfor every 1 more social events attended
16
3131
E.g. output and E.g. output and writewrite--upup
Logistic Regression Block 0: Beginning Block
Variables in the Equation
-.235 .143 2.673 1 .102 .791ConstantStep 0B S.E. Wald df Sig. Exp(B)
Block 1: Method = Enter
Omnibus Tests of Model Coefficients
.856 2 .652
.856 2 .652
.856 2 .652
StepBlockModel
Step 1Chi-square df Sig.
Model Summary
269.553a .004 .006Step1
-2 Loglikelihood
Cox & SnellR Square
NagelkerkeR Square
Estimation terminated at iteration number 3 becauseparameter estimates changed by less than .001.
a.
Variables in the Equation
-.026 .034 .613 1 .434 .974-.171 .313 .298 1 .585 .843-.118 .260 .204 1 .651 .889
c_ageec_women(1)Constant
Step1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: c_age, ec_women.a.
““A hierarchical logistic regression was conducted A hierarchical logistic regression was conducted predicting whether or not participants took predicting whether or not participants took political action from demographic factors political action from demographic factors (Block 1) and attitude strength (Block 2). Table (Block 1) and attitude strength (Block 2). Table 1 describes the means, standard deviations, 1 describes the means, standard deviations, and intercorrelations. The entry of the and intercorrelations. The entry of the demographics did not increase the variance demographics did not increase the variance accounted for, Nagelkerke Raccounted for, Nagelkerke R22 = .01, X= .01, X22 (2) = (2) = 0.86, p = .652, 0.86, p = .652, andand inspection of the inspection of the coefficients revealed that neither age nor coefficients revealed that neither age nor gender was significantly linked to action, gender was significantly linked to action, Wald tests < .30, Wald tests < .30, pps> .584.s> .584.
3232
Block 2: Method = Enter
Omnibus Tests of Model Coefficients
14.475 1 .00014.475 1 .00015.331 3 .002
StepBlockModel
Step 1Chi-square df Sig.
Model Summary
255.078a .075 .100Step1
-2 Loglikelihood
Cox & SnellR Square
NagelkerkeR Square
Estimation terminated at iteration number 4 becauseparameter estimates changed by less than .001.
a.
Variables in the Equation
-.045 .035 1.622 1 .203 .956-.054 .327 .028 1 .868 .947.404 .110 13.439 1 .000 1.498
-1.073 .379 8.015 1 .005 .342
c_ageec_women(1)atstr_scConstant
Step1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on step 1: atstr_sc.a.
“However, the entry of attitude strength in “However, the entry of attitude strength in Block 2, significantly increased the variance Block 2, significantly increased the variance accounted for, Nagelkerke Raccounted for, Nagelkerke R22 change = .09, change = .09, XX22(1) = 14.48, p < .001. (1) = 14.48, p < .001. Specifically, on a Specifically, on a scale from 0 to 5, every additional unit of scale from 0 to 5, every additional unit of attitude strength increased the attitude strength increased the likelihood of political action by 150%, likelihood of political action by 150%, Exp(B) = 1.50, Wald = 13.44, Exp(B) = 1.50, Wald = 13.44, pp < .001. < .001. The final model accounted for only 10% of The final model accounted for only 10% of the variance in action however, Xthe variance in action however, X22 (3) = (3) = 15.33, 15.33, pp = .002.”= .002.”
E.g. output and writeE.g. output and write--upup
17
3333
Logistic regression is seen quite often, e.g.:Logistic regression is seen quite often, e.g.:–– clinical psychology (what factors predict becoming clinical psychology (what factors predict becoming
schizophrenic, recurrence of depression?) schizophrenic, recurrence of depression?) –– social (predict attending rally, getting divorced?)social (predict attending rally, getting divorced?)–– org psych (predict quitting the firm / being promoted?)org psych (predict quitting the firm / being promoted?)
Occasionally other statistics are reported but the above Occasionally other statistics are reported but the above would serve in a journal article at the moment.would serve in a journal article at the moment.Also can have multiple categories on DV Also can have multiple categories on DV –– Use multinomial logistic regressionUse multinomial logistic regression
So worth knowingSo worth knowingField spells it all out rather nicely and goes thru SPSSField spells it all out rather nicely and goes thru SPSSCovered in Howell section 15.14 (5Covered in Howell section 15.14 (5thth & 6& 6thth eded))ButBut not assessed on exam!not assessed on exam!Also note: Log linear analysis is in Howell Also note: Log linear analysis is in Howell chptchpt 17 but we 17 but we won’t get around to covering this (as psychs you will won’t get around to covering this (as psychs you will come across logistic regression far more frequently)come across logistic regression far more frequently)
3434
(5) Interconnections between (5) Interconnections between ANOVA and regressionANOVA and regression
18
3535
ANOVA& t-tests
between/within
Bivariate (simple)
correlation
Factorial ANOVA
between/within& mixed
Multiple Regression
…multivariate methods……multivariate methods…
3636
experimental vs. correlational experimental vs. correlational researchresearch
this is what many will tell you about the differences this is what many will tell you about the differences between anova vs correlational designs:between anova vs correlational designs:
Anova designsAnova designs–– the only research strategy in which causation can be inferred the only research strategy in which causation can be inferred --
the factor can be said to “cause” changes in DVthe factor can be said to “cause” changes in DV–– this is because the IV is manipulatedthis is because the IV is manipulated
correlational researchcorrelational research–– can not be used to infer causalitycan not be used to infer causality–– this is because variables are not manipulated this is because variables are not manipulated ---- just measuredjust measured
19
3737
experimental vs. correlational experimental vs. correlational researchresearch
this is this is misleadingmisleading because:because:it is confuses it is confuses researchresearch methodology (PSYC3042) with methodology (PSYC3042) with
statisticalstatistical methodology (PSYC3010), and it assumes methodology (PSYC3010), and it assumes that the benefits of experimental research transfer that the benefits of experimental research transfer automatically to anovaautomatically to anova
-- the differences between experimental and correlational the differences between experimental and correlational research involve random assignment to levels of IV vs research involve random assignment to levels of IV vs observation of natural / measured levels of IVobservation of natural / measured levels of IV
-- These have These have NOTHINGNOTHING to do with the differences to do with the differences between anova and regression, which involve partitioning between anova and regression, which involve partitioning variance between factors and within versus between a variance between factors and within versus between a regression line and observationsregression line and observations
-- ANOVA can be carried out statistically with regression ANOVA can be carried out statistically with regression analyses; tanalyses; t--tests can be carried out with correlationstests can be carried out with correlations
-- All of these statistical techniques are generalisations of All of these statistical techniques are generalisations of one underlying model, the one underlying model, the general linear model (GLM)general linear model (GLM)
3838
The General Linear ModelThe General Linear ModelWhat is it?What is it?
XXijkijk = = μμ + + ααj j + + ββk k + + αβαβjk jk + e+ eijkijk
XXijij = = μμ + + ααj j + + ππi i + e+ eijij
Y =Y = bb11XX ++ bb22ZZ ++ bb33XZXZ ++ c c ++ ee
20
3939
The General Linear ModelThe General Linear ModelWhat is it?What is it?
a system of linear equations a system of linear equations which can be used to model datawhich can be used to model data quite similar to the T1000:quite similar to the T1000:
- powerful!- versatile!- can execute a range of operations!- can take on a variety of appearances!- provides the basis for just about every parametric statistical test we know (OK, weak link there )(OK, weak link there )
Read Cronbach, 1968 for moreRead Cronbach, 1968 for more
4040
21
4141
magic tricks!magic tricks!it is fairly easy to show that:it is fairly easy to show that:
1.1. a ta t--test is a correlationtest is a correlation2.2. factorial anova is a standard factorial anova is a standard
regression problemregression problem3.3. ancova is a hierarchical ancova is a hierarchical
regression problemregression problem4.4. interactions in anova are interactions in anova are
identical to those in MMRidentical to those in MMR
4242
correlation and the tcorrelation and the t--testtest
you may have heard of a you may have heard of a pointpoint--biserial biserial correlationcorrelation (Howell p. 297(Howell p. 297--305)305)
this is a special case of correlation where one of this is a special case of correlation where one of the variables is dichotomous (e.g., gender) and the variables is dichotomous (e.g., gender) and the other is continuous (e.g., height)the other is continuous (e.g., height)the other name for a the other name for a pointpoint--biserial biserial correlation is correlation is an an independent samples tindependent samples t--testtest
22
4343
Females Females MalesMales
150150 165165160160 170170165165 180180155155 175175
Heights of males and females – this is how we
are used to seeing the data laid out when we are doing hand calculations for t-test
but we know that SPSS would prefer that we lay the
data out like this
hmmm looks familiar
GenderGender HeightHeight1 1 1501501 1 1601601 1 1651651 1 1551552 2 1651652 2 1701702 2 1801802 2 175175
4444
so let’s run our tso let’s run our t--testtestIndependent Samples Test
.000 1.000 -3.286 6 .017
-3.286 6.000 .017
Equal variancesassumedEqual variancesnot assumed
HEIGHTF Sig.
Levene's Test forEquality of Variances
t df Sig. (2-tailed)
t-test for Equality of Means
t(6) = 3.29, p = .017
23
4545
now run as a correlation now run as a correlation (just as if we had two continuous variables)(just as if we had two continuous variables)
r = .802, p = .017, r2 = .643
Correlations
1 .802*. .0178 8
.802* 1
.017 .8 8
Pearson CorrelationSig. (2-tailed)NPearson CorrelationSig. (2-tailed)N
GENDER
HEIGHT
GENDER HEIGHT
Correlation is significant at the 0.05 level (2-tailed).*.
p value is the same as in t-test
4646
rere--run as an anova run as an anova (to get estimates of effect size)(to get estimates of effect size)
F(1,6) = 10.8, p = .017, η2 = .643p value is again the samepartial η2 = r2 (from previous slide)F (i.e., 10.8) = t2 (i.e., 3.292)
Tests of Between-Subjects Effects
Dependent Variable: HEIGHT
450.000a 1 450.000 10.800 .017 .643217800.000 1 217800.000 5227.200 .000 .999
450.000 1 450.000 10.800 .017 .643250.000 6 41.667
218500.000 8700.000 7
SourceCorrected ModelInterceptGENDERErrorTotalCorrected Total
Type III Sumof Squares df Mean Square F Sig.
Partial EtaSquared
R Squared = .643 (Adjusted R Squared = .583)a.
24
4747
now run as a regression now run as a regression (just for the sake of comparison)(just for the sake of comparison)
R2 = .643, F(1,6) = 10.8, p = .017
Model Summary
.802a .643 .583 6.45497Model1
R R SquareAdjustedR Square
Std. Error ofthe Estimate
Predictors: (Constant), GENDERa.
ANOVAb
450.000 1 450.000 10.800 .017a
250.000 6 41.667700.000 7
RegressionResidualTotal
Model1
Sum ofSquares df Mean Square F Sig.
Predictors: (Constant), GENDERa.
Dependent Variable: HEIGHTb.
R2 = partial η2 = r2
F and p arethe same
4848
an additional slide to consolidate an additional slide to consolidate structural modelsstructural models
First to help interpretation re-run MR using dummy coding (female = 1 male = 0) Can use structural model to calc means:
Group Statistics
4 172.5000 6.45497 3.227494 157.5000 6.45497 3.22749
gendermalefemale
heightN Mean Std. Deviation
Std. ErrorMean
From t-test
Coefficientsa
172.500 3.227 53.447 .000-15.000 4.564 -.802 -3.286 .017
(Constant)gender
Model1
B Std. Error
UnstandardizedCoefficients
Beta
StandardizedCoefficients
t Sig.
Dependent Variable: heighta.
From regress
Y hat = a + B1X 1
So, for men (coded as zero), Y hat = 172.50 – (15.00*0) = 172.50
And for women (coded as one), Y hat = 172.50 – (15.00*1) = 157.50
25
4949
explanationexplanationa ta t--test, or an anova between two groups, is just test, or an anova between two groups, is just a special case of correlation, a special case of correlation, –– which in turn is just a special case of regression, which in turn is just a special case of regression, –– which is a representation of the General Linear Modelwhich is a representation of the General Linear Model
SPSS did the sameSPSS did the same** thing in all four analyses thing in all four analyses ––it just presented the output in different waysit just presented the output in different ways
*(strictly speaking, bivariate correlations and t-tests are not executions of the GLM – they are calculated using ‘shortcuts’ that achieve the same basic results)
5050
hierarchical regression and ancovahierarchical regression and ancova
in in ancovaancova our goal was to remove the effects of our goal was to remove the effects of a covariate before examining our treatment a covariate before examining our treatment effecteffectin in hierarchical regressionhierarchical regression, the idea was to , the idea was to examine the contribution of a set of variables at examine the contribution of a set of variables at step 2 after accounting for prediction at step 1step 2 after accounting for prediction at step 1–– as it turns out, both are basically doing the same as it turns out, both are basically doing the same
thing!thing!
26
5151
let’s go back to let’s go back to our height data our height data –– and include and include
ageage as a as a covariate:covariate:
data is laid out how we would for an ancova or a hierarchical regression
SexSex AgeAge HeightHeight1 1 1616 1501501 1 1818 1601601 1 1717 1651651 1 1717 1551552 2 1616 1651652 2 1717 1701702 2 1818 1801802 2 1717 175175
5252
Tests of Between-Subjects Effects
Dependent Variable: HEIGHT
606.250a 2 303.125 16.167 .007 .86647.690 1 47.690 2.543 .172 .337
156.250 1 156.250 8.333 .034 .625450.000 1 450.000 24.000 .004 .828
93.750 5 18.750218500.000 8
700.000 7
SourceCorrected ModelInterceptAGEGENDERErrorTotalCorrected Total
Type III Sumof Squares df Mean Square F Sig.
Partial EtaSquared
R Squared = .866 (Adjusted R Squared = .812)a.
first run as an ancova first run as an ancova
for gender, F(1,5) = 24.00, p = .004
this is the effect after controlling for age
27
5353
Model Summary
.472a .223 .223 1.724 1 6 .237
.931b .866 .643 24.000 1 5 .004
Model12
R R SquareR SquareChange F Change df1 df2 Sig. F Change
Change Statistics
Predictors: (Constant), AGEa.
Predictors: (Constant), AGE, GENDERb.
now run as hierarchical regression now run as hierarchical regression
Fch(1,8) = 24.00, p = .004
this is the effect after controlling for age
5454
Minor diffs in output Minor diffs in output there are some there are some minorminor differences in presentation:differences in presentation:
–– in our in our ancovaancova we are given we are given η2p = .828 but in
regression the R2ch was .643– η2
p actually corresponds to the squared partial correlation for gender .912 = .828
Coefficientsa
.725 .496.472 1.313 .237 .472 .472
.976 .374.472 2.887 .034 .791 .472.802 4.899 .004 .910 .802
(Constant)AGE(Constant)AGEGENDER
Model1
2
Beta
StandardizedCoefficients
t Sig. Partial PartCorrelations
Dependent Variable: HEIGHTa.
28
5555
Minor diffs in outputMinor diffs in output–– in our ancova the test for age is given as in our ancova the test for age is given as FF(1,5) = (1,5) =
8.33, 8.33, pp = .034= .034– this actually corresponds to the test of the coefficient
for age in the full model at step 2:•• remember tremember t2 2 = F = F (2.887(2.8872 2 = 8.33)= 8.33)
Coefficientsa
.725 .496.472 1.313 .237 .472 .472
.976 .374.472 2.887 .034 .791 .472.802 4.899 .004 .910 .802
(Constant)AGE(Constant)AGEGENDER
Model1
2
Beta
StandardizedCoefficients
t Sig. Partial PartCorrelations
Dependent Variable: HEIGHTa.
5656
explanationexplanationancova and hierarchical regression achieve the ancova and hierarchical regression achieve the same broad purposesame broad purposesome minor differences in the output simply some minor differences in the output simply reflect defaults which have been programmed reflect defaults which have been programmed into SPSS into SPSS –– e.g., as effect sizes have only recently become e.g., as effect sizes have only recently become
emphasised for anova, these don’t line up as you emphasised for anova, these don’t line up as you would expect with the ones for regression, but the link would expect with the ones for regression, but the link is in there somewhere!is in there somewhere!
29
5757
interactions interactions –– MMR vs anovaMMR vs anova
testing interactions in anova and MMR testing interactions in anova and MMR look incredibly differentlook incredibly different–– this is just because they have different this is just because they have different
histories histories –– essentially they are doing the same thingessentially they are doing the same thing
5858
2 categorical variables2 categorical variables
going back to our height data, let’s say we going back to our height data, let’s say we wanted to examine the interaction wanted to examine the interaction between maternal diet and gender in the between maternal diet and gender in the prediction of heightprediction of height–– factor A is gender (M/F)factor A is gender (M/F)–– factor B is maternal diet (healthy, unhealthy)factor B is maternal diet (healthy, unhealthy)(N = 16)(N = 16)
30
5959
Tests of Between-Subjects Effects
Dependent Variable: HEIGHT
950.000a 3 316.667 8.444 .003435600.000 1 435600.000 11616.000 .000
625.000 1 625.000 16.667 .002100.000 1 100.000 2.667 .128225.000 1 225.000 6.000 .031450.000 12 37.500
437000.000 161400.000 15
SourceCorrected ModelInterceptGENDERDIETGENDER * DIETErrorTotalCorrected Total
Type III Sumof Squares df Mean Square F Significance
R Squared = .679 (Adjusted R Squared = .598)a.
anova anova –– the way we knowthe way we know
FF (1,12) = 6.00, (1,12) = 6.00, pp = .031= .031
6060
MMRMMR
in our MMR lecture we talked briefly about in our MMR lecture we talked briefly about categorical variables in MMR categorical variables in MMR –– they can they can get a bit tricky get a bit tricky but with dichotomous variables it is dead but with dichotomous variables it is dead easy easy –– enter additive effects (gender and diet) at step enter additive effects (gender and diet) at step
1 1 –– interaction term (gender*diet) at step 2 .interaction term (gender*diet) at step 2 .
31
6161
MMRMMRModel Summary
.720a .518 .444 7.20577 .518 6.981 2 13 .009
.824b .679 .598 6.12372 .161 6.000 1 12 .031
Model12
R R SquareAdjustedR Square
Std. Error ofthe Estimate
R SquareChange F Change df1 df2
Significance FChange
Change Statistics
Predictors: (constant) DIET, GENDER...a.
Predictors: (constant) DIET, GENDER, INT...b.
FchFch (1,12) = 6.00, (1,12) = 6.00, pp = .031= .031
6262
implications implications the GLM has been behind the scenes for just the GLM has been behind the scenes for just about all of the statistical methods examined in about all of the statistical methods examined in PSYC3010PSYC3010we stick to a lot of these conventions about we stick to a lot of these conventions about when to use ANOVA instead of regression for when to use ANOVA instead of regression for practical reasonspractical reasonsby understanding the common links through all by understanding the common links through all these analyses we can be less rigid in our use of these analyses we can be less rigid in our use of these toolsthese toolshere are some of the comparisons we can makehere are some of the comparisons we can make
32
6363
hypothesis testing hypothesis testing in in anovaanova we test the hypothesis that our manipulations we test the hypothesis that our manipulations have had a significant effect on our DVhave had a significant effect on our DV
HH00:: μμ11 = = μμ2 2 = = μμ33–– the null hypothesis the null hypothesis –– no differences among treatment meansno differences among treatment means
HH11:: the null hypothesis is falsethe null hypothesis is false–– the alternative hypothesis the alternative hypothesis –– there is at least one difference among there is at least one difference among
treatment meanstreatment means
in in regressionregression we test the hypothesis that our predictors we test the hypothesis that our predictors are accounting for a significant amount of variance in our are accounting for a significant amount of variance in our criterioncriterion
HH00:: the relationship between the criterion and the set of predictors is zero
HH11:: the relationship between the criterion and the set of predictors isnot zero
6464
variance partitioning variance partitioning in in anovaanova we want to partition the total variance out into we want to partition the total variance out into effects and error termseffects and error terms–– main effectsmain effects and and interactions interactions compared to compared to errorerror–– the goal is to attribute a the goal is to attribute a significantsignificant and and substantialsubstantial proportion proportion
of variance in our DV to our effectsof variance in our DV to our effects
in in regressionregression we want to model our data by finding the we want to model our data by finding the line/plane of best fit, i.e., the one that minimises errors of line/plane of best fit, i.e., the one that minimises errors of predictionprediction–– the model can then be described in terms of the model can then be described in terms of additive effectsadditive effects
and and interactions,interactions, which are compared to which are compared to errorerror–– the goal is to explain a the goal is to explain a significantsignificant and and substantialsubstantial proportion of proportion of
variance in our criterion as possiblevariance in our criterion as possible
33
6565
effect size effect size in in anovaanova we can quantify the amount of the total we can quantify the amount of the total variance which each effect accounts forvariance which each effect accounts for–– etaeta--squared squared (sample estimate)(sample estimate)–– omegaomega--squared squared (population estimate)(population estimate)
in in regressionregression we can quantify the amount of we can quantify the amount of variance that our model accounts forvariance that our model accounts for–– RR2 2 (sample estimate)(sample estimate)–– RR22 adjusted adjusted (population estimate)(population estimate)–– srsr22 (importance of individual predictor)(importance of individual predictor)
6666
complex relationships complex relationships in in anovaanova we can test for 2we can test for 2--way or 3way or 3--way interactions way interactions (and beyond!)(and beyond!)– the effect of factor A on the DV changes over levels of factor B–– followfollow--up these with simple effects up these with simple effects –– i.e., examine the effect of i.e., examine the effect of
A on the DV at each level of BA on the DV at each level of B
in in regressionregression we can test for 2we can test for 2--way or 3way or 3--way way interactions (and beyond!)interactions (and beyond!)–– the relationship between X and Y varies over values of Zthe relationship between X and Y varies over values of Z–– followfollow--up these with simple slopes up these with simple slopes –– i.e., examine the i.e., examine the
relationship between X and Y at high and low conditional relationship between X and Y at high and low conditional values of Zvalues of Z
34
6767
increasing power increasing power in in anovaanova we can employ a number of statistical and we can employ a number of statistical and methodological techniques: methodological techniques: –– blocking on a concomitant factorblocking on a concomitant factor–– remove individual differences (i.e., use a withinremove individual differences (i.e., use a within--subjects design)subjects design)–– include a covariate (i.e., use ancova)include a covariate (i.e., use ancova)
in in regressionregression we also have some similar techniques at we also have some similar techniques at our disposal:our disposal:–– partial the effect of another variable out first (i.e., use hierarchical partial the effect of another variable out first (i.e., use hierarchical
regression regression -- similar to ancova)similar to ancova)–– improve measurement (e.g., measure subjects with most reliable improve measurement (e.g., measure subjects with most reliable
measures measures –– i.e., higher alpha)i.e., higher alpha)
6868
The multivariate universe:The multivariate universe:Before 3010:Before 3010:–– Single explanationsSingle explanations–– Barely grasp difference Barely grasp difference
between correlations and between correlations and group differencesgroup differences
–– Tendency to rely too much Tendency to rely too much on pon p--valuesvalues
After 3010:After 3010:–– Multiple explanationsMultiple explanations–– Explanations that interact, Explanations that interact,
or are interor are inter--relatedrelated–– Variables considered jointly Variables considered jointly
so you can see interactions so you can see interactions and interand inter--relationships relationships explain more than explain more than considering each aloneconsidering each alone
–– Strong understanding of Strong understanding of correlations and group correlations and group differencesdifferences
–– Understanding key idea of Understanding key idea of effect sizeseffect sizes
35
6969
In the tutes: In the tutes: No tutes!No tutes!
In future :In future :
Consult times for me for the exam will be:Consult times for me for the exam will be:Monday 20 June 4Monday 20 June 4--5pm 5pm Friday 17 June 8Friday 17 June 8--10am10amMonday 13 June 1Monday 13 June 1--3pm3pmMonday 6 June 3Monday 6 June 3--4pm4pmOr by appointmentOr by appointment
Every effort will be made to post the A2 marks Every effort will be made to post the A2 marks online by Friday 18 June, 5pm, although this online by Friday 18 June, 5pm, although this cannot be guaranteedcannot be guaranteed
Assignment feedback sheets can be picked up Assignment feedback sheets can be picked up from Winnifred by appointmentfrom Winnifred by appointment
Thank you!Thank you!
Good luck on the examGood luck on the exam
7070