+ All Categories
Transcript
Page 1: An introduction to the psych package: Part I: data entry ... · myData

An introduction to the psych package Part I

data entry and data description

William RevelleDepartment of PsychologyNorthwestern University

October 4 2020

Contents01 Jump starting the psych packagendasha guide for the impatient 302 Psychometric functions are summarized in the second vignette 5

1 Overview of this and related documents 7

2 Getting started 8

3 Basic data analysis 931 Getting the data by using readfile 932 Data input from the clipboard 1033 Basic descriptive statistics 11

331 Outlier detection using outlier 12332 Basic data cleaning using scrub 13333 Recoding categorical variables into dummy coded variables 13

34 Simple descriptive graphics 15341 Scatter Plot Matrices 15342 Density or violin plots 18343 Means and error bars 20344 Error bars for tabular data 20345 Two dimensional displays of means and errors 24346 Back to back histograms 26347 Correlational structure 28348 Heatmap displays of correlational structure 29

35 Testing correlations 2936 Polychoric tetrachoric polyserial and biserial correlations 35

1

4 Multilevel modeling 3641 Decomposing data into within and between level correlations using statsBy 3642 Generating and displaying multilevel data 3743 Factor analysis by groups 37

5 Multiple Regression mediation moderation and set correlations 3851 Multiple regression from data or correlation matrices 3852 Mediation and Moderation analysis 4153 Set Correlation 45

6 Converting output to APA style tables using LATEX 47

7 Miscellaneous functions 48

8 Data sets 49

9 Development version and a users guide 51

10 Psychometric Theory 51

11 SessionInfo 51

2

01 Jump starting the psych packagendasha guide for the impatient

You have installed psych (section 2) and you want to use it without reading much moreWhat should you do

1 Activate the psych package and the psychTools package

library(psych)

library(psychTools)

2 Input your data (section 31) There are two ways to do this

bull Find and read standard files using readfile This will open a search windowfor your operating system which you can use to find the file If the file has asuffix of text txt TXT csv dat data sav xpt XPT r R rds Rdsrda Rda rdata Rdata or RData then the file will be opened and the datawill be read in (or loaded in the case of Rda files)

myData lt- readfile() find the appropriate file using your normal operating system

bull Alternatively go to your friendly text editor or data manipulation program(eg Excel) and copy the data to the clipboard Include a first line that has thevariable labels Paste it into psych using the readclipboardtab command

myData lt- readclipboardtab() if on the clipboard

Note that there are number of options for readclipboard for reading in Excelbased files lower triangular files etc

3 Make sure that what you just read is right Describe it (section 33) and perhapslook at the first and last few lines If you have multiple groups try describeBy

dim(myData) What are the dimensions of the data

describe(myData) or

describeBy(myDatagroups=mygroups) for descriptive statistics by groups

headTail(myData) show the first and last n lines of a file

4 Look at the patterns in the data If you have fewer than about 12 variables lookat the SPLOM (Scatter Plot Matrix) of the data using pairspanels (section 341)Then use the outlier function to detect outliers

pairspanels(myData)

outlier(myData)

5 Note that you might have some weird subjects probably due to data entry errorsEither edit the data by hand (use the edit command) or just scrub the data (section332)

cleaned lt- scrub(myData max=9) eg change anything great than 9 to NA

6 Graph the data with error bars for each variable (section 343)

errorbars(myData)

3

7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix

bull Descriptively (just the values) (section 347)

r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals

bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included

corPlot(r) examine the many options for this function

bull Inferentially (the values the ns and the p values) (section 35)

corrtest(myData)

8 Apply various regression models

Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix This is discussed in moredetail in the ldquoHow To use mediate and setCor to do mediation moderation andregression analysis tutorial

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables If wehave the raw data we can also find the interaction term (x1 x2) Althoughwe can find the regressions from just a correlation matrix we can not find theinteraction (moderation effect) unless given raw data

myData lt- satact

colnames(myData) lt- c(mod1med1x1x2y1y2)

setCor(y1 + y2 ~ x1 + x2 + x1x2 data = myData)

bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap We specify the mediation variable by enclosing it in parentheses andshow the moderation by the standard multiplication For the purpose of thisdemonstration we do the boot strap with just 50 iterations The default is5000 We use the data from which was downloaded from the supplementarymaterial for Hayes (2013) httpswwwafhayescompublichayes2013datazip

mediate(reaction ~ cond + (import) + (pmi) data =Tal_Orniter=50)

We can also find the moderation effect by adding in a product term

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediation

4

effect using a boot strap By default we find the raw regressions and meancenter If we specify zero=FALSE we do not mean center the data If wespecify std=TRUE we find the standardized regressions

mediate(respappr ~ prot sexism +(sexism)data=Garciazero=FALSE niter=50

main=Moderated mediation (not mean centered))

02 Psychometric functions are summarized in the second vignette

Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette which may be downloadedfrom httpspersonality-projectorgrpsychvignettesoverviewpdf Abrief review of the functions available is included here In addition there are helpfultutorials for Finding omega How to score scales and find reliability and for Usingpsych for factor analysis at httpspersonality-projectorgr

bull Test for the number of factors in your data using parallel analysis (faparallel)or Very Simple Structure (vss)

faparallel(myData)

vss(myData)

bull Factor analyze (see section 41) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities suchas minres (section 411) alpha factoring and wls Compare the solution toa hierarchical cluster analysis using the ICLUST algorithm (Revelle 1979) (seesection 416) Also consider a hierarchical factor solution to find coefficient ω)

fa(myData)

iclust(myData)

omega(myData)

If you prefer to do a principal components analysis you may use the principalfunction The default is one component

principal(myData)

bull Some people like to find coefficient α as an estimate of reliability This maybe done for a single scale using the alpha function Perhaps more useful is theability to create several scales as unweighted averages of specified items usingthe scoreItems function and to find various estimates of internal consistencyfor these scales find their intercorrelations and find scores for all the subjects

alpha(myData) score all of the items as part of one scale

myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

myscores lt- scoreItems(myKeysmyData) form several scales

myscores show the highlights of the results

At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can be

5

done in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

6

1 Overview of this and related documents

The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca

functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects overtime and items mlPlot will graph items over for each subject mlArrange converts wide

7

data frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https

personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_

sempdf)

In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include

1 An introduction (vignette) of the psych package

2 An overview (vignette) of the psych package

3 Installing R and some useful packages

4 Using R and the psych package to find omegah and ωt

5 Using R and the psych for factor analysis and principal components analysis

6 Using the scoreItems function to find scale scores and scale statistics

7 Using mediate and setCor to do mediation moderation and regression analysis

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwls

8

or principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

installpackages(list(c(GPArotationmnormt)

Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

library(psychTools)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

library(psychTools)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be done

9

using the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row

and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

this will keep the value labels for sav files

myspss lt- readfile(usevaluelabels=TRUE)

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

10

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt library(psychTools)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew kurtosis se

gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002

education 2 700 316 143 3 331 148 0 5 5 -068 -007 005

age 3 700 2559 950 22 2386 593 13 65 52 164 242 036

ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018

11

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

---------------------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q

12

(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

13

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 2: An introduction to the psych package: Part I: data entry ... · myData

4 Multilevel modeling 3641 Decomposing data into within and between level correlations using statsBy 3642 Generating and displaying multilevel data 3743 Factor analysis by groups 37

5 Multiple Regression mediation moderation and set correlations 3851 Multiple regression from data or correlation matrices 3852 Mediation and Moderation analysis 4153 Set Correlation 45

6 Converting output to APA style tables using LATEX 47

7 Miscellaneous functions 48

8 Data sets 49

9 Development version and a users guide 51

10 Psychometric Theory 51

11 SessionInfo 51

2

01 Jump starting the psych packagendasha guide for the impatient

You have installed psych (section 2) and you want to use it without reading much moreWhat should you do

1 Activate the psych package and the psychTools package

library(psych)

library(psychTools)

2 Input your data (section 31) There are two ways to do this

bull Find and read standard files using readfile This will open a search windowfor your operating system which you can use to find the file If the file has asuffix of text txt TXT csv dat data sav xpt XPT r R rds Rdsrda Rda rdata Rdata or RData then the file will be opened and the datawill be read in (or loaded in the case of Rda files)

myData lt- readfile() find the appropriate file using your normal operating system

bull Alternatively go to your friendly text editor or data manipulation program(eg Excel) and copy the data to the clipboard Include a first line that has thevariable labels Paste it into psych using the readclipboardtab command

myData lt- readclipboardtab() if on the clipboard

Note that there are number of options for readclipboard for reading in Excelbased files lower triangular files etc

3 Make sure that what you just read is right Describe it (section 33) and perhapslook at the first and last few lines If you have multiple groups try describeBy

dim(myData) What are the dimensions of the data

describe(myData) or

describeBy(myDatagroups=mygroups) for descriptive statistics by groups

headTail(myData) show the first and last n lines of a file

4 Look at the patterns in the data If you have fewer than about 12 variables lookat the SPLOM (Scatter Plot Matrix) of the data using pairspanels (section 341)Then use the outlier function to detect outliers

pairspanels(myData)

outlier(myData)

5 Note that you might have some weird subjects probably due to data entry errorsEither edit the data by hand (use the edit command) or just scrub the data (section332)

cleaned lt- scrub(myData max=9) eg change anything great than 9 to NA

6 Graph the data with error bars for each variable (section 343)

errorbars(myData)

3

7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix

bull Descriptively (just the values) (section 347)

r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals

bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included

corPlot(r) examine the many options for this function

bull Inferentially (the values the ns and the p values) (section 35)

corrtest(myData)

8 Apply various regression models

Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix This is discussed in moredetail in the ldquoHow To use mediate and setCor to do mediation moderation andregression analysis tutorial

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables If wehave the raw data we can also find the interaction term (x1 x2) Althoughwe can find the regressions from just a correlation matrix we can not find theinteraction (moderation effect) unless given raw data

myData lt- satact

colnames(myData) lt- c(mod1med1x1x2y1y2)

setCor(y1 + y2 ~ x1 + x2 + x1x2 data = myData)

bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap We specify the mediation variable by enclosing it in parentheses andshow the moderation by the standard multiplication For the purpose of thisdemonstration we do the boot strap with just 50 iterations The default is5000 We use the data from which was downloaded from the supplementarymaterial for Hayes (2013) httpswwwafhayescompublichayes2013datazip

mediate(reaction ~ cond + (import) + (pmi) data =Tal_Orniter=50)

We can also find the moderation effect by adding in a product term

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediation

4

effect using a boot strap By default we find the raw regressions and meancenter If we specify zero=FALSE we do not mean center the data If wespecify std=TRUE we find the standardized regressions

mediate(respappr ~ prot sexism +(sexism)data=Garciazero=FALSE niter=50

main=Moderated mediation (not mean centered))

02 Psychometric functions are summarized in the second vignette

Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette which may be downloadedfrom httpspersonality-projectorgrpsychvignettesoverviewpdf Abrief review of the functions available is included here In addition there are helpfultutorials for Finding omega How to score scales and find reliability and for Usingpsych for factor analysis at httpspersonality-projectorgr

bull Test for the number of factors in your data using parallel analysis (faparallel)or Very Simple Structure (vss)

faparallel(myData)

vss(myData)

bull Factor analyze (see section 41) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities suchas minres (section 411) alpha factoring and wls Compare the solution toa hierarchical cluster analysis using the ICLUST algorithm (Revelle 1979) (seesection 416) Also consider a hierarchical factor solution to find coefficient ω)

fa(myData)

iclust(myData)

omega(myData)

If you prefer to do a principal components analysis you may use the principalfunction The default is one component

principal(myData)

bull Some people like to find coefficient α as an estimate of reliability This maybe done for a single scale using the alpha function Perhaps more useful is theability to create several scales as unweighted averages of specified items usingthe scoreItems function and to find various estimates of internal consistencyfor these scales find their intercorrelations and find scores for all the subjects

alpha(myData) score all of the items as part of one scale

myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

myscores lt- scoreItems(myKeysmyData) form several scales

myscores show the highlights of the results

At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can be

5

done in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

6

1 Overview of this and related documents

The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca

functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects overtime and items mlPlot will graph items over for each subject mlArrange converts wide

7

data frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https

personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_

sempdf)

In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include

1 An introduction (vignette) of the psych package

2 An overview (vignette) of the psych package

3 Installing R and some useful packages

4 Using R and the psych package to find omegah and ωt

5 Using R and the psych for factor analysis and principal components analysis

6 Using the scoreItems function to find scale scores and scale statistics

7 Using mediate and setCor to do mediation moderation and regression analysis

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwls

8

or principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

installpackages(list(c(GPArotationmnormt)

Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

library(psychTools)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

library(psychTools)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be done

9

using the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row

and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

this will keep the value labels for sav files

myspss lt- readfile(usevaluelabels=TRUE)

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

10

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt library(psychTools)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew kurtosis se

gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002

education 2 700 316 143 3 331 148 0 5 5 -068 -007 005

age 3 700 2559 950 22 2386 593 13 65 52 164 242 036

ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018

11

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

---------------------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q

12

(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

13

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 3: An introduction to the psych package: Part I: data entry ... · myData

01 Jump starting the psych packagendasha guide for the impatient

You have installed psych (section 2) and you want to use it without reading much moreWhat should you do

1 Activate the psych package and the psychTools package

library(psych)

library(psychTools)

2 Input your data (section 31) There are two ways to do this

bull Find and read standard files using readfile This will open a search windowfor your operating system which you can use to find the file If the file has asuffix of text txt TXT csv dat data sav xpt XPT r R rds Rdsrda Rda rdata Rdata or RData then the file will be opened and the datawill be read in (or loaded in the case of Rda files)

myData lt- readfile() find the appropriate file using your normal operating system

bull Alternatively go to your friendly text editor or data manipulation program(eg Excel) and copy the data to the clipboard Include a first line that has thevariable labels Paste it into psych using the readclipboardtab command

myData lt- readclipboardtab() if on the clipboard

Note that there are number of options for readclipboard for reading in Excelbased files lower triangular files etc

3 Make sure that what you just read is right Describe it (section 33) and perhapslook at the first and last few lines If you have multiple groups try describeBy

dim(myData) What are the dimensions of the data

describe(myData) or

describeBy(myDatagroups=mygroups) for descriptive statistics by groups

headTail(myData) show the first and last n lines of a file

4 Look at the patterns in the data If you have fewer than about 12 variables lookat the SPLOM (Scatter Plot Matrix) of the data using pairspanels (section 341)Then use the outlier function to detect outliers

pairspanels(myData)

outlier(myData)

5 Note that you might have some weird subjects probably due to data entry errorsEither edit the data by hand (use the edit command) or just scrub the data (section332)

cleaned lt- scrub(myData max=9) eg change anything great than 9 to NA

6 Graph the data with error bars for each variable (section 343)

errorbars(myData)

3

7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix

bull Descriptively (just the values) (section 347)

r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals

bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included

corPlot(r) examine the many options for this function

bull Inferentially (the values the ns and the p values) (section 35)

corrtest(myData)

8 Apply various regression models

Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix This is discussed in moredetail in the ldquoHow To use mediate and setCor to do mediation moderation andregression analysis tutorial

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables If wehave the raw data we can also find the interaction term (x1 x2) Althoughwe can find the regressions from just a correlation matrix we can not find theinteraction (moderation effect) unless given raw data

myData lt- satact

colnames(myData) lt- c(mod1med1x1x2y1y2)

setCor(y1 + y2 ~ x1 + x2 + x1x2 data = myData)

bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap We specify the mediation variable by enclosing it in parentheses andshow the moderation by the standard multiplication For the purpose of thisdemonstration we do the boot strap with just 50 iterations The default is5000 We use the data from which was downloaded from the supplementarymaterial for Hayes (2013) httpswwwafhayescompublichayes2013datazip

mediate(reaction ~ cond + (import) + (pmi) data =Tal_Orniter=50)

We can also find the moderation effect by adding in a product term

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediation

4

effect using a boot strap By default we find the raw regressions and meancenter If we specify zero=FALSE we do not mean center the data If wespecify std=TRUE we find the standardized regressions

mediate(respappr ~ prot sexism +(sexism)data=Garciazero=FALSE niter=50

main=Moderated mediation (not mean centered))

02 Psychometric functions are summarized in the second vignette

Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette which may be downloadedfrom httpspersonality-projectorgrpsychvignettesoverviewpdf Abrief review of the functions available is included here In addition there are helpfultutorials for Finding omega How to score scales and find reliability and for Usingpsych for factor analysis at httpspersonality-projectorgr

bull Test for the number of factors in your data using parallel analysis (faparallel)or Very Simple Structure (vss)

faparallel(myData)

vss(myData)

bull Factor analyze (see section 41) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities suchas minres (section 411) alpha factoring and wls Compare the solution toa hierarchical cluster analysis using the ICLUST algorithm (Revelle 1979) (seesection 416) Also consider a hierarchical factor solution to find coefficient ω)

fa(myData)

iclust(myData)

omega(myData)

If you prefer to do a principal components analysis you may use the principalfunction The default is one component

principal(myData)

bull Some people like to find coefficient α as an estimate of reliability This maybe done for a single scale using the alpha function Perhaps more useful is theability to create several scales as unweighted averages of specified items usingthe scoreItems function and to find various estimates of internal consistencyfor these scales find their intercorrelations and find scores for all the subjects

alpha(myData) score all of the items as part of one scale

myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

myscores lt- scoreItems(myKeysmyData) form several scales

myscores show the highlights of the results

At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can be

5

done in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

6

1 Overview of this and related documents

The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca

functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects overtime and items mlPlot will graph items over for each subject mlArrange converts wide

7

data frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https

personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_

sempdf)

In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include

1 An introduction (vignette) of the psych package

2 An overview (vignette) of the psych package

3 Installing R and some useful packages

4 Using R and the psych package to find omegah and ωt

5 Using R and the psych for factor analysis and principal components analysis

6 Using the scoreItems function to find scale scores and scale statistics

7 Using mediate and setCor to do mediation moderation and regression analysis

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwls

8

or principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

installpackages(list(c(GPArotationmnormt)

Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

library(psychTools)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

library(psychTools)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be done

9

using the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row

and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

this will keep the value labels for sav files

myspss lt- readfile(usevaluelabels=TRUE)

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

10

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt library(psychTools)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew kurtosis se

gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002

education 2 700 316 143 3 331 148 0 5 5 -068 -007 005

age 3 700 2559 950 22 2386 593 13 65 52 164 242 036

ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018

11

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

---------------------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q

12

(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

13

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 4: An introduction to the psych package: Part I: data entry ... · myData

7 Find the correlations of all of your data lowerCor will by default find the pairwisecorrelations round them to 2 decimals and display the lower off diagonal matrix

bull Descriptively (just the values) (section 347)

r lt- lowerCor(myData) The correlation matrix rounded to 2 decimals

bull Graphically (section 348) Another way is to show a heat map of the correla-tions with the correlation values included

corPlot(r) examine the many options for this function

bull Inferentially (the values the ns and the p values) (section 35)

corrtest(myData)

8 Apply various regression models

Several functions are meant to do multiple regressions either from the raw data orfrom a variancecovariance matrix or a correlation matrix This is discussed in moredetail in the ldquoHow To use mediate and setCor to do mediation moderation andregression analysis tutorial

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables If wehave the raw data we can also find the interaction term (x1 x2) Althoughwe can find the regressions from just a correlation matrix we can not find theinteraction (moderation effect) unless given raw data

myData lt- satact

colnames(myData) lt- c(mod1med1x1x2y1y2)

setCor(y1 + y2 ~ x1 + x2 + x1x2 data = myData)

bull mediate will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables mediatedthrough a mediation variable It then tests the mediation effect using a bootstrap We specify the mediation variable by enclosing it in parentheses andshow the moderation by the standard multiplication For the purpose of thisdemonstration we do the boot strap with just 50 iterations The default is5000 We use the data from which was downloaded from the supplementarymaterial for Hayes (2013) httpswwwafhayescompublichayes2013datazip

mediate(reaction ~ cond + (import) + (pmi) data =Tal_Orniter=50)

We can also find the moderation effect by adding in a product term

bull mediate will take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple xvariables mediated through a mediation variable It then tests the mediation

4

effect using a boot strap By default we find the raw regressions and meancenter If we specify zero=FALSE we do not mean center the data If wespecify std=TRUE we find the standardized regressions

mediate(respappr ~ prot sexism +(sexism)data=Garciazero=FALSE niter=50

main=Moderated mediation (not mean centered))

02 Psychometric functions are summarized in the second vignette

Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette which may be downloadedfrom httpspersonality-projectorgrpsychvignettesoverviewpdf Abrief review of the functions available is included here In addition there are helpfultutorials for Finding omega How to score scales and find reliability and for Usingpsych for factor analysis at httpspersonality-projectorgr

bull Test for the number of factors in your data using parallel analysis (faparallel)or Very Simple Structure (vss)

faparallel(myData)

vss(myData)

bull Factor analyze (see section 41) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities suchas minres (section 411) alpha factoring and wls Compare the solution toa hierarchical cluster analysis using the ICLUST algorithm (Revelle 1979) (seesection 416) Also consider a hierarchical factor solution to find coefficient ω)

fa(myData)

iclust(myData)

omega(myData)

If you prefer to do a principal components analysis you may use the principalfunction The default is one component

principal(myData)

bull Some people like to find coefficient α as an estimate of reliability This maybe done for a single scale using the alpha function Perhaps more useful is theability to create several scales as unweighted averages of specified items usingthe scoreItems function and to find various estimates of internal consistencyfor these scales find their intercorrelations and find scores for all the subjects

alpha(myData) score all of the items as part of one scale

myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

myscores lt- scoreItems(myKeysmyData) form several scales

myscores show the highlights of the results

At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can be

5

done in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

6

1 Overview of this and related documents

The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca

functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects overtime and items mlPlot will graph items over for each subject mlArrange converts wide

7

data frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https

personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_

sempdf)

In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include

1 An introduction (vignette) of the psych package

2 An overview (vignette) of the psych package

3 Installing R and some useful packages

4 Using R and the psych package to find omegah and ωt

5 Using R and the psych for factor analysis and principal components analysis

6 Using the scoreItems function to find scale scores and scale statistics

7 Using mediate and setCor to do mediation moderation and regression analysis

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwls

8

or principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

installpackages(list(c(GPArotationmnormt)

Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

library(psychTools)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

library(psychTools)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be done

9

using the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row

and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

this will keep the value labels for sav files

myspss lt- readfile(usevaluelabels=TRUE)

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

10

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt library(psychTools)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew kurtosis se

gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002

education 2 700 316 143 3 331 148 0 5 5 -068 -007 005

age 3 700 2559 950 22 2386 593 13 65 52 164 242 036

ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018

11

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

---------------------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q

12

(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

13

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 5: An introduction to the psych package: Part I: data entry ... · myData

effect using a boot strap By default we find the raw regressions and meancenter If we specify zero=FALSE we do not mean center the data If wespecify std=TRUE we find the standardized regressions

mediate(respappr ~ prot sexism +(sexism)data=Garciazero=FALSE niter=50

main=Moderated mediation (not mean centered))

02 Psychometric functions are summarized in the second vignette

Many additional functions particularly designed for basic and advanced psychomet-rics are discussed more fully in the Overview Vignette which may be downloadedfrom httpspersonality-projectorgrpsychvignettesoverviewpdf Abrief review of the functions available is included here In addition there are helpfultutorials for Finding omega How to score scales and find reliability and for Usingpsych for factor analysis at httpspersonality-projectorgr

bull Test for the number of factors in your data using parallel analysis (faparallel)or Very Simple Structure (vss)

faparallel(myData)

vss(myData)

bull Factor analyze (see section 41) the data with a specified number of factors(the default is 1) the default method is minimum residual the default rotationfor more than one factor is oblimin There are many more possibilities suchas minres (section 411) alpha factoring and wls Compare the solution toa hierarchical cluster analysis using the ICLUST algorithm (Revelle 1979) (seesection 416) Also consider a hierarchical factor solution to find coefficient ω)

fa(myData)

iclust(myData)

omega(myData)

If you prefer to do a principal components analysis you may use the principalfunction The default is one component

principal(myData)

bull Some people like to find coefficient α as an estimate of reliability This maybe done for a single scale using the alpha function Perhaps more useful is theability to create several scales as unweighted averages of specified items usingthe scoreItems function and to find various estimates of internal consistencyfor these scales find their intercorrelations and find scores for all the subjects

alpha(myData) score all of the items as part of one scale

myKeys lt- makekeys(nvar=20list(first = c(1-35-7810)second=c(24-61115-16)))

myscores lt- scoreItems(myKeysmyData) form several scales

myscores show the highlights of the results

At this point you have had a chance to see the highlights of the psych package and to dosome basic (and advanced) data analysis You might find reading this entire vignette aswell as the Overview Vignette to be helpful to get a broader understanding of what can be

5

done in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

6

1 Overview of this and related documents

The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca

functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects overtime and items mlPlot will graph items over for each subject mlArrange converts wide

7

data frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https

personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_

sempdf)

In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include

1 An introduction (vignette) of the psych package

2 An overview (vignette) of the psych package

3 Installing R and some useful packages

4 Using R and the psych package to find omegah and ωt

5 Using R and the psych for factor analysis and principal components analysis

6 Using the scoreItems function to find scale scores and scale statistics

7 Using mediate and setCor to do mediation moderation and regression analysis

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwls

8

or principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

installpackages(list(c(GPArotationmnormt)

Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

library(psychTools)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

library(psychTools)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be done

9

using the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row

and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

this will keep the value labels for sav files

myspss lt- readfile(usevaluelabels=TRUE)

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

10

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt library(psychTools)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew kurtosis se

gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002

education 2 700 316 143 3 331 148 0 5 5 -068 -007 005

age 3 700 2559 950 22 2386 593 13 65 52 164 242 036

ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018

11

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

---------------------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q

12

(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

13

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 6: An introduction to the psych package: Part I: data entry ... · myData

done in R using the psych Remember that the help command () is available for everyfunction Try running the examples for each help page

6

1 Overview of this and related documents

The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca

functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects overtime and items mlPlot will graph items over for each subject mlArrange converts wide

7

data frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https

personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_

sempdf)

In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include

1 An introduction (vignette) of the psych package

2 An overview (vignette) of the psych package

3 Installing R and some useful packages

4 Using R and the psych package to find omegah and ωt

5 Using R and the psych for factor analysis and principal components analysis

6 Using the scoreItems function to find scale scores and scale statistics

7 Using mediate and setCor to do mediation moderation and regression analysis

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwls

8

or principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

installpackages(list(c(GPArotationmnormt)

Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

library(psychTools)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

library(psychTools)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be done

9

using the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row

and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

this will keep the value labels for sav files

myspss lt- readfile(usevaluelabels=TRUE)

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

10

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt library(psychTools)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew kurtosis se

gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002

education 2 700 316 143 3 331 148 0 5 5 -068 -007 005

age 3 700 2559 950 22 2386 593 13 65 52 164 242 036

ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018

11

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

---------------------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q

12

(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

13

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 7: An introduction to the psych package: Part I: data entry ... · myData

1 Overview of this and related documents

The psych package (Revelle 2018) has been developed at Northwestern University since2005 to include functions most useful for personality psychometric and psychological re-search The package is also meant to supplement a text on psychometric theory (Revelleprep) a draft of which is available at httpspersonality-projectorgrbook

Some of the functions (eg readfile readclipboard describe pairspanels scat-terhist errorbars multihist bibars) are useful for basic data entry and descrip-tive analyses

Psychometric applications emphasize techniques for dimension reduction including factoranalysis cluster analysis and principal components analysis The fa function includes sixmethods of factor analysis (minimum residual principal axis alpha factoring weightedleast squares generalized least squares and maximum likelihood factor analysis) PrincipalComponents Analysis (PCA) is also available through the use of the principal or pca

functions Determining the number of factors or components to extract may be done byusing the Very Simple Structure (Revelle and Rocklin 1979) (vss) Minimum AveragePartial correlation (Velicer 1976) (MAP) or parallel analysis (faparallel) criteria Theseand several other criteria are included in the nfactors function Two parameter ItemResponse Theory (IRT) models for dichotomous or polytomous items may be found byfactoring tetrachoric or polychoric correlation matrices and expressing the resultingparameters in terms of location and discrimination using irtfa

Bifactor and hierarchical factor structures may be estimated by using Schmid Leimantransformations (Schmid and Leiman 1957) (schmid) to transform a hierarchical factorstructure into a bifactor solution (Holzinger and Swineford 1937) Higher order modelscan also be found using famulti

Scale construction can be done using the Item Cluster Analysis (Revelle 1979) (iclust)function to determine the structure and to calculate reliability coefficients α (Cronbach1951)(alpha scoreItems scoremultiplechoice) β (Revelle 1979 Revelle and Zin-barg 2009) (iclust) and McDonaldrsquos ωh and ωt (McDonald 1999) (omega) Guttmanrsquos sixestimates of internal consistency reliability (Guttman (1945) as well as additional estimates(Revelle and Zinbarg 2009) are in the guttman function The six measures of Intraclasscorrelation coefficients (ICC) discussed by Shrout and Fleiss (1979) are also available

For data with a a multilevel structure (eg items within subjects across time or itemswithin subjects across groups) the describeBy statsBy functions will give basic descrip-tives by group StatsBy also will find within group (or subject) correlations as well as thebetween group correlation

multilevelreliability mlr will find various generalizability statistics for subjects overtime and items mlPlot will graph items over for each subject mlArrange converts wide

7

data frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https

personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_

sempdf)

In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include

1 An introduction (vignette) of the psych package

2 An overview (vignette) of the psych package

3 Installing R and some useful packages

4 Using R and the psych package to find omegah and ωt

5 Using R and the psych for factor analysis and principal components analysis

6 Using the scoreItems function to find scale scores and scale statistics

7 Using mediate and setCor to do mediation moderation and regression analysis

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwls

8

or principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

installpackages(list(c(GPArotationmnormt)

Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

library(psychTools)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

library(psychTools)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be done

9

using the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row

and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

this will keep the value labels for sav files

myspss lt- readfile(usevaluelabels=TRUE)

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

10

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt library(psychTools)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew kurtosis se

gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002

education 2 700 316 143 3 331 148 0 5 5 -068 -007 005

age 3 700 2559 950 22 2386 593 13 65 52 164 242 036

ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018

11

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

---------------------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q

12

(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

13

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 8: An introduction to the psych package: Part I: data entry ... · myData

data frames to long data frames suitable for multilevel modeling

Graphical displays include Scatter Plot Matrix (SPLOM) plots using pairspanels cor-relation ldquoheat mapsrdquo (corPlot) factor cluster and structural diagrams using fadiagramiclustdiagram structurediagram and hetdiagram as well as item response charac-teristics and item and test information characteristic curves plotirt and plotpoly

This vignette is meant to give an overview of the psych package That is it is meantto give a summary of the main functions in the psych package with examples of howthey are used for data description dimension reduction and scale construction The ex-tended user manual at psych_manualpdf includes examples of graphic output and moreextensive demonstrations than are found in the help menus (Also available at https

personality-projectorgrpsych_manualpdf) The vignette psych for sem athttpspersonalty-projectorgrpsych_for_sempdf discusses how to use psychas a front end to the sem package of John Fox (Fox et al 2012) (The vignette isalso available at httpspersonality-projectorgrpsychvignettespsych_for_

sempdf)

In addition there are a growing number of ldquoHowTordquos at the personality project Currentlythese include

1 An introduction (vignette) of the psych package

2 An overview (vignette) of the psych package

3 Installing R and some useful packages

4 Using R and the psych package to find omegah and ωt

5 Using R and the psych for factor analysis and principal components analysis

6 Using the scoreItems function to find scale scores and scale statistics

7 Using mediate and setCor to do mediation moderation and regression analysis

For a step by step tutorial in the use of the psych package and the base functions inR for basic personality research see the guide for using R for personality research athttpspersonalitytheoryorgrrshorthtml For an introduction to psychometrictheory with applications in R see the draft chapters at httpspersonality-project

orgrbook)

2 Getting started

Some of the functions described in the Overview Vignette require other packages This isnot the case for the functions listed in this Introduction Particularly useful for rotatingthe results of factor analyses (from eg fa factorminres factorpa factorwls

8

or principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

installpackages(list(c(GPArotationmnormt)

Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

library(psychTools)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

library(psychTools)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be done

9

using the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row

and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

this will keep the value labels for sav files

myspss lt- readfile(usevaluelabels=TRUE)

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

10

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt library(psychTools)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew kurtosis se

gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002

education 2 700 316 143 3 331 148 0 5 5 -068 -007 005

age 3 700 2559 950 22 2386 593 13 65 52 164 242 036

ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018

11

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

---------------------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q

12

(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

13

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 9: An introduction to the psych package: Part I: data entry ... · myData

or principal) or hierarchical factor models using omega or schmid is the GPArotationpackage These and other useful packages may be installed by first installing and thenusing the task views (ctv) package to install the ldquoPsychometricsrdquo task view but doing itthis way is not necessary

The ldquoPsychometricsrdquo task view will install a large number of useful packages To installthe bare minimum for the examples in this vignette it is necessary to install just 3 pack-ages

installpackages(list(c(GPArotationmnormt)

Alternatively many packages for psychometric can be downloaded at once using the ldquoPsy-chometricsrdquo task view

installpackages(ctv)

library(ctv)

taskviews(Psychometrics)

Because of the difficulty of installing the package Rgraphviz alternative graphics have beendeveloped and are available as diagram functions If Rgraphviz is available some functionswill take advantage of it An alternative is to useldquodotrdquooutput of commands for any externalgraphics package that uses the dot language

3 Basic data analysis

A number of psych functions facilitate the entry of data and finding basic descriptivestatistics

Remember to run any of the psych functions it is necessary to make the package activeby using the library command

library(psych)

library(psychTools)

The other packages once installed will be called automatically by psych

It is possible to automatically load psych and other functions by creating and then savinga ldquoFirstrdquo function eg

First lt- function(x) library(psych)

library(psychTools)

31 Getting the data by using readfile

Although many find copying the data to the clipboard and then using the readclipboardfunctions (see below) a helpful alternative is to read the data in directly This can be done

9

using the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row

and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

this will keep the value labels for sav files

myspss lt- readfile(usevaluelabels=TRUE)

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

10

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt library(psychTools)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew kurtosis se

gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002

education 2 700 316 143 3 331 148 0 5 5 -068 -007 005

age 3 700 2559 950 22 2386 593 13 65 52 164 242 036

ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018

11

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

---------------------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q

12

(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

13

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 10: An introduction to the psych package: Part I: data entry ... · myData

using the readfile function which calls filechoose to find the file and then based uponthe suffix of the file chooses the appropriate way to read it For files with suffixes of texttxt TXT csv dat data sav xpt XPT r R rds Rds rda Rda rdata Rdataor RData the file will be read correctly

mydata lt- readfile()

If the file contains Fixed Width Format (fwf) data the column information can be specifiedwith the widths command

mydata lt- readfile(widths = c(4rep(135)) will read in a file without a header row

and 36 fields the first of which is 4 colums the rest of which are 1 column each

If the file is a RData file (with suffix of RData Rda rda Rdata or rdata) the objectwill be loaded Depending what was stored this might be several objects If the file is asav file from SPSS it will be read with the most useful default options (converting the fileto a dataframe and converting character fields to numeric) Alternative options may bespecified If it is an export file from SAS (xpt or XPT) it will be read csv files (commaseparated files) normal txt or text files data or dat files will be read as well These areassumed to have a header row of variable labels (header=TRUE) If the data do not havea header row you must specify readfile(header=FALSE)

To read SPSS files and to keep the value labels specify usevaluelabels=TRUE

this will keep the value labels for sav files

myspss lt- readfile(usevaluelabels=TRUE)

32 Data input from the clipboard

There are of course many ways to enter data into R Reading from a local file usingreadtable is perhaps the most preferred However many users will enter their datain a text editor or spreadsheet program and then want to copy and paste into R Thismay be done by using readtable and specifying the input file as ldquoclipboardrdquo (PCs) orldquopipe(pbpaste)rdquo (Macs) Alternatively the readclipboard set of functions are perhapsmore user friendly

readclipboard is the base function for reading data from the clipboard

readclipboardcsv for reading text that is comma delimited

readclipboardtab for reading text that is tab delimited (eg copied directly from anExcel file)

readclipboardlower for reading input of a lower triangular matrix with or without adiagonal The resulting object is a square matrix

10

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt library(psychTools)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew kurtosis se

gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002

education 2 700 316 143 3 331 148 0 5 5 -068 -007 005

age 3 700 2559 950 22 2386 593 13 65 52 164 242 036

ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018

11

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

---------------------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q

12

(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

13

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 11: An introduction to the psych package: Part I: data entry ... · myData

readclipboardupper for reading input of an upper triangular matrix

readclipboardfwf for reading in fixed width fields (some very old data sets)

For example given a data set copied to the clipboard from a spreadsheet just enter thecommand

mydata lt- readclipboard()

This will work if every data field has a value and even missing data are given some values(eg NA or -999) If the data were entered in a spreadsheet and the missing valueswere just empty cells then the data should be read in as a tab delimited or by using thereadclipboardtab function

gt mydata lt- readclipboard(sep=t) define the tab option or

gt mytabdata lt- readclipboardtab() just use the alternative function

For the case of data in fixed width fields (some old data sets tend to have this format)copy to the clipboard and then specify the width of each field (in the example below thefirst variable is 5 columns the second is 2 columns the next 5 are 1 column the last 4 are3 columns)

gt mydata lt- readclipboardfwf(widths=c(52rep(15)rep(34))

33 Basic descriptive statistics

Once the data are read in then describe or describeBy will provide basic descriptivestatistics arranged in a data frame format Consider the data set satact which in-cludes data from 700 web based participants on 3 demographic variables and 3 abilitymeasures

describe reports means standard deviations medians min max range skew kurtosisand standard errors for integer or real data Non-numeric data although the statisticsare meaningless will be treated as if numeric (based upon the categorical coding ofthe data) and will be flagged with an

describeBy reports descriptive statistics broken down by some categorizing variable (eggender age etc)

gt library(psych)

gt library(psychTools)

gt data(satact)

gt describe(satact) basic descriptive statistics

vars n mean sd median trimmed mad min max range skew kurtosis se

gender 1 700 165 048 2 168 000 1 2 1 -061 -162 002

education 2 700 316 143 3 331 148 0 5 5 -068 -007 005

age 3 700 2559 950 22 2386 593 13 65 52 164 242 036

ACT 4 700 2855 482 29 2884 445 3 36 33 -066 053 018

11

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

---------------------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q

12

(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

13

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 12: An introduction to the psych package: Part I: data entry ... · myData

SATV 5 700 61223 11290 620 61945 11861 200 800 600 -064 033 427

SATQ 6 687 61022 11564 620 61725 11861 200 800 600 -059 -002 441

These data may then be analyzed by groups defined in a logical statement or by some othervariable Eg break down the descriptive data for males or females These descriptivedata can also be seen graphically using the errorbarsby function (Figure 5) By settingskew=FALSE and ranges=FALSE the output is limited to the most basic statistics

gt basic descriptive statistics by a grouping variable

gt describeBy(satactsatact$genderskew=FALSEranges=FALSE)

Descriptive statistics by group

group 1

vars n mean sd se

gender 1 247 100 000 000

education 2 247 300 154 010

age 3 247 2586 974 062

ACT 4 247 2879 506 032

SATV 5 247 61511 11416 726

SATQ 6 245 63587 11602 741

---------------------------------------------------------------------------

group 2

vars n mean sd se

gender 1 453 200 000 000

education 2 453 326 135 006

age 3 453 2545 937 044

ACT 4 453 2842 469 022

SATV 5 453 61066 11231 528

SATQ 6 442 59600 11307 538

The output from the describeBy function can be forced into a matrix form for easy analysisby other programs In addition describeBy can group by several grouping variables at thesame time

gt samat lt- describeBy(satactlist(satact$gendersatact$education)

+ skew=FALSEranges=FALSEmat=TRUE)

gt headTail(samat)

item group1 group2 vars n mean sd se

gender1 1 1 0 1 27 1 0 0

gender2 2 2 0 1 30 2 0 0

gender3 3 1 1 1 20 1 0 0

gender4 4 2 1 1 25 2 0 0

ltNAgt ltNAgt ltNAgt

SATQ9 69 1 4 6 51 6359 10412 1458

SATQ10 70 2 4 6 86 59759 10624 1146

SATQ11 71 1 5 6 46 65783 8961 1321

SATQ12 72 2 5 6 93 60672 10555 1095

331 Outlier detection using outlier

One way to detect unusual data is to consider how far each data point is from the mul-tivariate centroid of the data That is find the squared Mahalanobis distance for eachdata point and then compare these to the expected values of χ2 This produces a Q-Q

12

(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

13

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 13: An introduction to the psych package: Part I: data entry ... · myData

(quantle-quantile) plot with the n most extreme data points labeled (Figure 1) The outliervalues are in the vector d2

332 Basic data cleaning using scrub

If after describing the data it is apparent that there were data entry errors that need tobe globally replaced with NA or only certain ranges of data will be analyzed the data canbe ldquocleanedrdquo using the scrub function

Consider a data set of 10 rows of 12 columns with values from 1 - 120 All values of columns3 - 5 that are less than 30 40 or 50 respectively or greater than 70 in any of the threecolumns will be replaced with NA In addition any value exactly equal to 45 will be setto NA (max and isvalue are set to one value here but they could be a different value forevery column)

gt x lt- matrix(1120ncol=10byrow=TRUE)

gt colnames(x) lt- paste(V110sep=)gt newx lt- scrub(x35min=c(304050)max=70isvalue=45newvalue=NA)

gt newx

V1 V2 V3 V4 V5 V6 V7 V8 V9 V10

[1] 1 2 NA NA NA 6 7 8 9 10

[2] 11 12 NA NA NA 16 17 18 19 20

[3] 21 22 NA NA NA 26 27 28 29 30

[4] 31 32 33 NA NA 36 37 38 39 40

[5] 41 42 43 44 NA 46 47 48 49 50

[6] 51 52 53 54 55 56 57 58 59 60

[7] 61 62 63 64 65 66 67 68 69 70

[8] 71 72 NA NA NA 76 77 78 79 80

[9] 81 82 NA NA NA 86 87 88 89 90

[10] 91 92 NA NA NA 96 97 98 99 100

[11] 101 102 NA NA NA 106 107 108 109 110

[12] 111 112 NA NA NA 116 117 118 119 120

Note that the number of subjects for those columns has decreased and the minimums havegone up but the maximums down Data cleaning and examination for outliers should be aroutine part of any data analysis

333 Recoding categorical variables into dummy coded variables

Sometimes categorical variables (eg college major occupation ethnicity) are to be ana-lyzed using correlation or regression To do this one can form ldquodummy codesrdquo which aremerely binary variables for each category This may be done using dummycode Subse-quent analyses using these dummy coded variables may be using biserial or point biserial(regular Pearson r) to show effect sizes and may be plotted in eg spider plots

Alternatively sometimes data were coded originally as categorical (MaleFemale HighSchool some College in college etc) and you want to convert these columns of data tonumeric This is done by char2numeric

13

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 14: An introduction to the psych package: Part I: data entry ... · myData

gt png( outlierpng )

gt d2 lt- outlier(satactcex=8)

gt devoff()

null device

1

Figure 1 Using the outlier function to graphically show outliers The y axis is theMahalanobis D2 the X axis is the distribution of χ2 for the same number of degrees offreedom The outliers detected here may be shown graphically using pairspanels (see2 and may be found by sorting d2

14

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 15: An introduction to the psych package: Part I: data entry ... · myData

34 Simple descriptive graphics

Graphic descriptions of data are very helpful both for understanding the data as well ascommunicating important results Scatter Plot Matrices (SPLOMS) using the pairspanelsfunction are useful ways to look for strange effects involving outliers and non-linearitieserrorbarsby will show group means with 95 confidence boundaries By default er-rorbarsby and errorbars will show ldquocats eyesrdquo to graphically show the confidencelimits (Figure 5) This may be turned off by specifying eyes=FALSE densityBy or vio-

linBy may be used to show the distribution of the data in ldquoviolinrdquo plots (Figure 4) (Theseare sometimes called ldquolava-lamprdquo plots)

341 Scatter Plot Matrices

Scatter Plot Matrices (SPLOMS) are very useful for describing the data The pairspanelsfunction adapted from the help menu for the pairs function produces xy scatter plots ofeach pair of variables below the diagonal shows the histogram of each variable on thediagonal and shows the lowess locally fit regression line as well An ellipse around themean with the axis length reflecting one standard deviation of the x and y variables is alsodrawn The x axis in each scatter plot represents the column variable the y axis the rowvariable (Figure 2) When plotting many subjects it is both faster and cleaner to set theplot character (pch) to be rsquorsquo (See Figure 2 for an example)

pairspanels will show the pairwise scatter plots of all the variables as well as his-tograms locally smoothed regressions and the Pearson correlation When plottingmany data points (as in the case of the satact data it is possible to specify that theplot character is a period to get a somewhat cleaner graphic However in this figureto show the outliers we use colors and a larger plot character If we want to indicatersquosignificancersquo of the correlations by the conventional use of rsquomagic astricksrsquo we can setthe stars=TRUE option

Another example of pairspanels is to show differences between experimental groupsConsider the data in the affect data set The scores reflect post test scores on positiveand negative affect and energetic and tense arousal The colors show the results for fourmovie conditions depressing frightening movie neutral and a comedy

Yet another demonstration of pairspanels is useful when you have many subjects andwant to show the density of the distributions To do this we will use the makekeys

and scoreItems functions (discussed in the second vignette) to create scales measuringEnergetic Arousal Tense Arousal Positive Affect and Negative Affect (see the msq helpfile) We then show a pairspanels scatter plot matrix where we smooth the data pointsand show the density of the distribution by color

gt keys lt- makekeys(msq[175]list(

+ EA = c(active energetic vigorous wakeful wideawake fullofpep

15

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 16: An introduction to the psych package: Part I: data entry ... · myData

gt png( pairspanelspng )

gt satd2 lt- dataframe(satactd2) combine the d2 statistics from before with the satact dataframe

gt pairspanels(satd2bg=c(yellowblue)[(d2 gt 25)+1]pch=21stars=TRUE)

gt devoff()

null device

1

Figure 2 Using the pairspanels function to graphically show relationships The x axisin each scatter plot represents the column variable the y axis the row variable Note theextreme outlier for the ACT If the plot character were set to a period (pch=rsquorsquo) it wouldmake a cleaner graphic but in to show the outliers in color we use the plot characters 21and 22

16

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 17: An introduction to the psych package: Part I: data entry ... · myData

gt png(affectpng)gt pairspanels(affect[1417]bg=c(redblackwhiteblue)[affect$Film]pch=21

+ main=Affect varies by movies )

gt devoff()

null device

1

Figure 3 Using the pairspanels function to graphically show relationships The x axis ineach scatter plot represents the column variable the y axis the row variable The coloringrepresent four different movie conditions

17

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 18: An introduction to the psych package: Part I: data entry ... · myData

+ lively -sleepy -tired -drowsy)

+ TA =c(intense jittery fearful tense clutchedup -quiet -still

+ -placid -calm -atrest)

+ PA =c(active excited strong inspired determined attentive

+ interested enthusiastic proud alert)

+ NAf =c(jittery nervous scared afraid guilty ashamed distressed

+ upset hostile irritable )) )

gt scores lt- scoreItems(keysmsq[175])

gt png(msqpng)gt pairspanels(scores$scoressmoother=TRUE

gt main =Density distributions of four measures of affect )

gt

gt devoff()

Using the pairspanels function to graphically show relationships (Not shown in theinterests of space) The x axis in each scatter plot represents the column variable they axis the row variable The variables are four measures of motivational state for 3896participants Each scale is the average score of 10 items measuring motivational stateCompare this a plot with smoother set to FALSE

342 Density or violin plots

Graphical presentation of data may be shown using box plots to show the median and 25thand 75th percentiles A powerful alternative is to show the density distribution using theviolinBy function (Figure 4) or the more conventional density plot for multiple groups(Figure 10

18

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 19: An introduction to the psych package: Part I: data entry ... · myData

gt png(violinpng)gt data(satact)

gt violinBy(satact56gendergrpname=c(M F)main=Density Plot by gender for SAT V and Q)

gt devoff()

null device

1

Figure 4 Using the violinBy function to show the distribution of SAT V and Q for malesand females The plot shows the medians and 25th and 75th percentiles as well as theentire range and the density distribution

19

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 20: An introduction to the psych package: Part I: data entry ... · myData

343 Means and error bars

Additional descriptive graphics include the ability to draw error bars on sets of data aswell as to draw error bars in both the x and y directions for paired data These are thefunctions errorbars errorbarsby errorbarstab and errorcrosses

errorbars show the 95 confidence intervals for each variable in a data frame or ma-trix These errors are based upon normal theory and the standard errors of the meanAlternative options include +- one standard deviation or 1 standard error If thedata are repeated measures the error bars will be reflect the between variable cor-relations By default the confidence intervals are displayed using a ldquocats eyesrdquo plotwhich emphasizes the distribution of confidence within the confidence interval

errorbarsby does the same but grouping the data by some condition

errorbarstab draws bar graphs from tabular data with error bars based upon thestandard error of proportion (σp =

radicpqN)

errorcrosses draw the confidence intervals for an x set and a y set of the same size

The use of the errorbarsby function allows for graphic comparisons of different groups(see Figure 5) Five personality measures are shown as a function of high versus low scoreson a ldquolierdquo scale People with higher lie scores tend to report being more agreeable consci-entious and less neurotic than people with lower lie scores The error bars are based uponnormal theory and thus are symmetric rather than reflect any skewing in the data

Although not recommended it is possible to use the errorbars function to draw bargraphs with associated error bars (This kind of dynamite plot (Figure 7) can be verymisleading in that the scale is arbitrary Go to a discussion of the problems in present-ing data this way at httpsemdbolkerwikidotcomblogdynamite In the exampleshown note that the graph starts at 0 although is out of the range This is a functionof using bars which always are assumed to start at zero Consider other ways of showingyour data

344 Error bars for tabular data

However it is sometimes useful to show error bars for tabular data either found by thetable function or just directly input These may be found using the errorbarstab

function

20

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 21: An introduction to the psych package: Part I: data entry ... · myData

gt data(epibfi)

gt errorbarsby(epibfi[610]epibfi$epilielt4)

95 confidence limits

Grouping Variable

bfag

ree

bfco

nbf

ext

bfne

urbf

open

group1 group2

050

100

150

Figure 5 Using the errorbarsby function shows that self reported personality scales onthe Big Five Inventory vary as a function of the Lie scale on the EPI The ldquocats eyesrdquo showthe distribution of the confidence

21

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 22: An introduction to the psych package: Part I: data entry ... · myData

gt errorbarsby(satact[56]satact$genderbars=TRUE

+ labels=c(MaleFemale)ylab=SAT scorexlab=)

SATV SATQ

95 confidence limits

SAT

sco

re

200

300

400

500

600

700

800

200

300

400

500

600

700

800

Figure 6 A ldquoDynamite plotrdquo of SAT scores as a function of gender is one way of misleadingthe reader By using a bar graph the range of scores is ignored Bar graphs start from 0

22

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 23: An introduction to the psych package: Part I: data entry ... · myData

gt T lt- with(satacttable(gendereducation))

gt rownames(T) lt- c(MF)

gt errorbarstab(Tway=bothylab=Proportion of Education Levelxlab=Level of Education

+ main=Proportion of sample by education level)

Proportion of sample by education level

Level of Education

Pro

port

ion

of E

duca

tion

Leve

l

000

005

010

015

020

025

030

M 0 M 1 M 2 M 3 M 4 M 5

000

005

010

015

020

025

030

Figure 7 The proportion of each education level that is Male or Female By using theway=rdquobothrdquo option the percentages and errors are based upon the grand total Alterna-tively way=rdquocolumnsrdquo finds column wise percentages way=rdquorowsrdquo finds rowwise percent-ages The data can be converted to percentages (as shown) or by total count (raw=TRUE)The function invisibly returns the probabilities and standard errors See the help menu foran example of entering the data as a dataframe

23

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 24: An introduction to the psych package: Part I: data entry ... · myData

345 Two dimensional displays of means and errors

Yet another way to display data for different conditions is to use the errorCrosses func-tion For instance the effect of various movies on both ldquoEnergetic Arousalrdquo and ldquoTenseArousalrdquo can be seen in one graph and compared to the same movie manipulations onldquoPositive Affectrdquo and ldquoNegative Affectrdquo Note how Energetic Arousal is increased by threeof the movie manipulations but that Positive Affect increases following the Happy movieonly

24

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 25: An introduction to the psych package: Part I: data entry ... · myData

gt op lt- par(mfrow=c(12))

gt data(affect)

gt colors lt- c(blackredwhiteblue)

gt films lt- c(SadHorrorNeutralHappy)

gt affectstats lt- errorCircles(EA2TA2data=affect[-c(120)]group=Filmlabels=films

+ xlab=Energetic Arousal ylab=Tense Arousalylim=c(1022)xlim=c(820)pch=16

+ cex=2colors=colors main = Movies effect on arousal)gt errorCircles(PA2NA2data=affectstatslabels=filmsxlab=Positive Affect

+ ylab=Negative Affect pch=16cex=2colors=colors main =Movies effect on affect)

gt op lt- par(mfrow=c(11))

8 12 16 20

1012

1416

1820

22

Movies effect on arousal

Energetic Arousal

Tens

e A

rous

al

SadHorror

NeutralHappy

6 8 10 12

24

68

10

Movies effect on affect

Positive Affect

Neg

ativ

e A

ffect

Sad

Horror

NeutralHappy

Figure 8 The use of the errorCircles function allows for two dimensional displays ofmeans and error bars The first call to errorCircles finds descriptive statistics for theaffect dataframe based upon the grouping variable of Film These data are returned andthen used by the second call which examines the effect of the same grouping variable upondifferent measures The size of the circles represent the relative sample sizes for each groupThe data are from the PMC lab and reported in Smillie et al (2012)

25

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 26: An introduction to the psych package: Part I: data entry ... · myData

346 Back to back histograms

The bibars function summarize the characteristics of two groups (eg males and females)on a second variable (eg age) by drawing back to back histograms (see Figure 9)

data(bfi)gt png( bibarspng )

gt bibars(bfiagegenderylab=Agemain=Age by males and females)

gt devoff()

null device

1

Figure 9 A bar plot of the age distribution for males and females shows the use of bibarsThe data are males and females from 2800 cases collected using the SAPA procedure andare available as part of the bfi data set An alternative way of displaying these data is inthe densityBy in the next figure

26

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 27: An introduction to the psych package: Part I: data entry ... · myData

gt png(histopng)gt data(satact)

gt densityBy(bfiagegrp=gender)

gt devoff()

null device

1

Figure 10 Using the densitynBy function to show the age distribution for males andfemales The plot is a conventional density diagram for two two groups Compare this tothe bibars plot in the previous figure By plotting densities we can see that the malesare slightly over represented in the younger ranges

27

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 28: An introduction to the psych package: Part I: data entry ... · myData

347 Correlational structure

There are many ways to display correlations Tabular displays are probably the mostcommon The output from the cor function in core R is a rectangular matrix lowerMat

will round this to (2) digits and then display as a lower off diagonal matrix lowerCor

calls cor with use=lsquopairwisersquo method=lsquopearsonrsquo as default values and returns (invisibly)the full correlation matrix and displays the lower off diagonal matrix

gt lowerCor(satact)

gendr edctn age ACT SATV SATQ

gender 100

education 009 100

age -002 055 100

ACT -004 015 011 100

SATV -002 005 -004 056 100

SATQ -017 003 -003 059 064 100

When comparing results from two different groups it is convenient to display them as onematrix with the results from one group below the diagonal and the other group above thediagonal Use lowerUpper to do this

gt female lt- subset(satactsatact$gender==2)

gt male lt- subset(satactsatact$gender==1)

gt lower lt- lowerCor(male[-1])

edctn age ACT SATV SATQ

education 100

age 061 100

ACT 016 015 100

SATV 002 -006 061 100

SATQ 008 004 060 068 100

gt upper lt- lowerCor(female[-1])

edctn age ACT SATV SATQ

education 100

age 052 100

ACT 016 008 100

SATV 007 -003 053 100

SATQ 003 -009 058 063 100

gt both lt- lowerUpper(lowerupper)

gt round(both2)

education age ACT SATV SATQ

education NA 052 016 007 003

age 061 NA 008 -003 -009

ACT 016 015 NA 053 058

SATV 002 -006 061 NA 063

SATQ 008 004 060 068 NA

It is also possible to compare two matrices by taking their differences and displaying one (be-low the diagonal) and the difference of the second from the first above the diagonal

gt diffs lt- lowerUpper(lowerupperdiff=TRUE)

gt round(diffs2)

28

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 29: An introduction to the psych package: Part I: data entry ... · myData

education age ACT SATV SATQ

education NA 009 000 -005 005

age 061 NA 007 -003 013

ACT 016 015 NA 008 002

SATV 002 -006 061 NA 005

SATQ 008 004 060 068 NA

348 Heatmap displays of correlational structure

Perhaps a better way to see the structure in a correlation matrix is to display a heat mapof the correlations This is just a matrix color coded to represent the magnitude of thecorrelation This is useful when considering the number of factors in a data set Considerthe Thurstone data set which has a clear 3 factor solution (Figure 11) or a simulated dataset of 24 variables with a circumplex structure (Figure 12) The color coding representsa ldquoheat maprdquo of the correlations with darker shades of red representing stronger negativeand darker shades of blue stronger positive correlations As an option the value of thecorrelation can be shown

Yet another way to show structure is to use ldquospiderrdquo plots Particularly if variables areordered in some meaningful way (eg in a circumplex) a spider plot will show this structureeasily This is just a plot of the magnitude of the correlation as a radial line with lengthranging from 0 (for a correlation of -1) to 1 (for a correlation of 1) (See Figure 13)

35 Testing correlations

Correlations are wonderful descriptive statistics of the data but some people like to testwhether these correlations differ from zero or differ from each other The cortest func-tion (in the stats package) will test the significance of a single correlation and the rcorr

function in the Hmisc package will do this for many correlations In the psych packagethe corrtest function reports the correlation (Pearson Spearman or Kendall) betweenall variables in either one or two data frames or matrices as well as the number of obser-vations for each case and the (two-tailed) probability for each correlation Unfortunatelythese probability values have not been corrected for multiple comparisons and so shouldbe taken with a great deal of salt Thus in corrtest and corrp the raw probabilitiesare reported below the diagonal and the probabilities adjusted for multiple comparisonsusing (by default) the Holm correction are reported above the diagonal (Table 1) (See thepadjust function for a discussion of Holm (1979) and other corrections)

Testing the difference between any two correlations can be done using the rtest functionThe function actually does four different tests (based upon an article by Steiger (1980)depending upon the input

1) For a sample size n find the t and p value for a single correlation as well as the confidenceinterval

gt rtest(503)

29

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 30: An introduction to the psych package: Part I: data entry ... · myData

gt png(corplotpng)gt corPlot(Thurstonenumbers=TRUEupper=FALSEdiag=FALSEmain=9 cognitive variables from Thurstone)

gt devoff()

null device

1

Figure 11 The structure of correlation matrix can be seen more clearly if the variables aregrouped by factor and then the correlations are shown by color By using the rsquonumbersrsquooption the values are displayed as well By default the complete matrix is shown Settingupper=FALSE and diag=FALSE shows a cleaner figure

30

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 31: An introduction to the psych package: Part I: data entry ... · myData

gt png(circplotpng)gt circ lt- simcirc(24)

gt rcirc lt- cor(circ)

gt corPlot(rcircmain=24 variables in a circumplex)gt devoff()

null device

1

Figure 12 Using the corPlot function to show the correlations in a circumplex Correlationsare highest near the diagonal diminish to zero further from the diagonal and the increaseagain towards the corners of the matrix Circumplex structures are common in the studyof affect For circumplex structures it is perhaps useful to show the complete matrix

31

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 32: An introduction to the psych package: Part I: data entry ... · myData

gt png(spiderpng)gt oplt- par(mfrow=c(22))

gt spider(y=c(161218)x=124data=rcircfill=TRUEmain=Spider plot of 24 circumplex variables)

gt op lt- par(mfrow=c(11))

gt devoff()

null device

1

Figure 13 A spider plot can show circumplex structure very clearly Circumplex structuresare common in the study of affect

32

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 33: An introduction to the psych package: Part I: data entry ... · myData

Table 1 The corrtest function reports correlations cell sizes and raw and adjustedprobability values corrp reports the probability values for a correlation matrix Bydefault the adjustment used is that of Holm (1979)gt corrtest(satact)

Callcorrtest(x = satact)

Correlation matrix

gender education age ACT SATV SATQ

gender 100 009 -002 -004 -002 -017

education 009 100 055 015 005 003

age -002 055 100 011 -004 -003

ACT -004 015 011 100 056 059

SATV -002 005 -004 056 100 064

SATQ -017 003 -003 059 064 100

Sample Size

gender education age ACT SATV SATQ

gender 700 700 700 700 700 687

education 700 700 700 700 700 687

age 700 700 700 700 700 687

ACT 700 700 700 700 700 687

SATV 700 700 700 700 700 687

SATQ 687 687 687 687 687 687

Probability values (Entries above the diagonal are adjusted for multiple tests)

gender education age ACT SATV SATQ

gender 000 017 100 100 1 0

education 002 000 000 000 1 1

age 058 000 000 003 1 1

ACT 033 000 000 000 0 0

SATV 062 022 026 000 0 0

SATQ 000 036 037 000 0 0

To see confidence intervals of the correlations print with the short=FALSE option

33

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 34: An introduction to the psych package: Part I: data entry ... · myData

Correlation tests

Callrtest(n = 50 r12 = 03)

Test of significance of a correlation

t value 218 with probability lt 0034

and confidence interval 002 053

2) For sample sizes of n and n2 (n2 = n if not specified) find the z of the difference betweenthe z transformed correlations divided by the standard error of the difference of two zscores

gt rtest(3046)

Correlation tests

Callrtest(n = 30 r12 = 04 r34 = 06)

Test of difference between two independent correlations

z value 099 with probability 032

3) For sample size n and correlations ra= r12 rb= r23 and r13 specified test for thedifference of two dependent correlations (Steiger case A)

gt rtest(103451)

Correlation tests

Call[1] rtest(n = 103 r12 = 04 r23 = 01 r13 = 05 )

Test of difference between two correlated correlations

t value -089 with probability lt 037

4) For sample size n test for the difference between two dependent correlations involvingdifferent variables (Steiger case B)

gt rtest(103567558) steiger Case B

Correlation tests

Callrtest(n = 103 r12 = 05 r34 = 06 r23 = 07 r13 = 05 r14 = 05

r24 = 08)

Test of difference between two dependent correlations

z value -12 with probability 023

To test whether a matrix of correlations differs from what would be expected if the popu-lation correlations were all zero the function cortest follows Steiger (1980) who pointedout that the sum of the squared elements of a correlation matrix or the Fisher z scoreequivalents is distributed as chi square under the null hypothesis that the values are zero(ie elements of the identity matrix) This is particularly useful for examining whethercorrelations in a single matrix differ from zero or for comparing two matrices Althoughobvious cortest can be used to test whether the satact data matrix produces non-zerocorrelations (it does) This is a much more appropriate test when testing whether a residualmatrix differs from zero

gt cortest(satact)

Tests of correlation matrices

Callcortest(R1 = satact)

Chi Square value 132542 with df = 15 with probability lt 18e-273

34

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 35: An introduction to the psych package: Part I: data entry ... · myData

36 Polychoric tetrachoric polyserial and biserial correlations

The Pearson correlation of dichotomous data is also known as the φ coefficient If thedata eg ability items are thought to represent an underlying continuous although latentvariable the φ will underestimate the value of the Pearson applied to these latent variablesOne solution to this problem is to use the tetrachoric correlation which is based uponthe assumption of a bivariate normal distribution that has been cut at certain points Thedrawtetra function demonstrates the process (Figure 14) This is also shown in terms ofdichotomizing the bivariate normal density function using the drawcor function A simplegeneralization of this to the case of the multiple cuts is the polychoric correlation

gt drawtetra()

minus3 minus2 minus1 0 1 2 3

minus3

minus2

minus1

01

23

Y rho = 05phi = 028

X gt τY gt Τ

X lt τY gt Τ

X gt τY lt Τ

X lt τY lt Τ

x

dnor

m(x

)

X gt τ

τ

x1

Y gt Τ

Τ

Figure 14 The tetrachoric correlation estimates what a Pearson correlation would be givena two by two table of observed values assumed to be sampled from a bivariate normaldistribution The φ correlation is just a Pearson r performed on the observed values

35

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 36: An introduction to the psych package: Part I: data entry ... · myData

The tetrachoric correlation estimates what a Pearson correlation would be given a two bytwo table of observed values assumed to be sampled from a bivariate normal distributionThe φ correlation is just a Pearson r performed on the observed values It is found (labo-riously) by optimizing the fit of the bivariate normal for various values of the correlationto the observed cell frequencies In the interests of space we do not show the next figurebut it can be created by

drawcor(expand=20cuts=c(00))

Other estimated correlations based upon the assumption of bivariate normality with cutpoints include the biserial and polyserial correlation

If the data are a mix of continuous polytomous and dichotomous variables the mixedcor

function will calculate the appropriate mixture of Pearson polychoric tetrachoric biserialand polyserial correlations

The correlation matrix resulting from a number of tetrachoric or polychoric correlationmatrix sometimes will not be positive semi-definite This will sometimes happen if thecorrelation matrix is formed by using pair-wise deletion of cases The corsmooth functionwill adjust the smallest eigen values of the correlation matrix to make them positive rescaleall of them to sum to the number of variables and produce aldquosmoothedrdquocorrelation matrixAn example of this problem is a data set of burt which probably had a typo in the originalcorrelation matrix Smoothing the matrix corrects this problem

4 Multilevel modeling

Correlations between individuals who belong to different natural groups (based upon egethnicity age gender college major or country) reflect an unknown mixture of the pooledcorrelation within each group as well as the correlation of the means of these groupsThese two correlations are independent and do not allow inferences from one level (thegroup) to the other level (the individual) When examining data at two levels (eg theindividual and by some grouping variable) it is useful to find basic descriptive statistics(means sds ns per group within group correlations) as well as between group statistics(over all descriptive statistics and overall between group correlations) Of particular useis the ability to decompose a matrix of correlations at the individual level into correlationswithin group and correlations between groups

41 Decomposing data into within and between level correlations usingstatsBy

There are at least two very powerful packages (nlme and multilevel) which allow for complexanalysis of hierarchical (multilevel) data structures statsBy is a much simpler function

36

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 37: An introduction to the psych package: Part I: data entry ... · myData

to give some of the basic descriptive statistics for two level models (nlme and multilevelallow for statistical inference but the descriptives of statsBy are useful)

This follows the decomposition of an observed correlation into the pooled correlation withingroups (rwg) and the weighted correlation of the means between groups which is discussedby Pedhazur (1997) and by Bliese (2009) in the multilevel package

rxy = ηxwg lowastηywg lowast rxywg + ηxbg lowastηybg lowast rxybg (1)

where rxy is the normal correlation which may be decomposed into a within group andbetween group correlations rxywg and rxybg and η (eta) is the correlation of the data withthe within group values or the group means

42 Generating and displaying multilevel data

withinBetween is an example data set of the mixture of within and between group cor-relations The within group correlations between 9 variables are set to be 1 0 and -1while those between groups are also set to be 1 0 -1 These two sets of correlations arecrossed such that V1 V4 and V7 have within group correlations of 1 as do V2 V5 andV8 and V3 V6 and V9 V1 has a within group correlation of 0 with V2 V5 and V8and a -1 within group correlation with V3 V6 and V9 V1 V2 and V3 share a betweengroup correlation of 1 as do V4 V5 and V6 and V7 V8 and V9 The first group has a 0between group correlation with the second and a -1 with the third group See the help filefor withinBetween to display these data

simmultilevel will generate simulated data with a multilevel structure

The statsByboot function will randomize the grouping variable ntrials times and find thestatsBy output This can take a long time and will produce a great deal of output Thisoutput can then be summarized for relevant variables using the statsBybootsummary

function specifying the variable of interest

Consider the case of the relationship between various tests of ability when the data aregrouped by level of education (statsBy(satact)) or when affect data are analyzed withinand between an affect manipulation (statsBy(affect) )

43 Factor analysis by groups

Confirmatory factor analysis comparing the structures in multiple groups can be donein the lavaan package However for exploratory analyses of the structure within each ofmultiple groups the faBy function may be used in combination with the statsBy functionFirst run pfunstatsBy with the correlation option set to TRUE and then run faBy on theresulting output

37

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 38: An introduction to the psych package: Part I: data entry ... · myData

sb lt- statsBy(bfi[c(12527)] group=educationcors=TRUE)

faBy(sbnfactors=5) find the 5 factor solution for each education level

5 Multiple Regression mediation moderation and set cor-relations

The typical application of the lm function is to do a linear model of one Y variable as afunction of multiple X variables Because lm is designed to analyze complex interactions itrequires raw data as input It is however sometimes convenient to do multiple regressionfrom a correlation or covariance matrix This is done using the setCor which will workwith either raw data covariance matrices or correlation matrices

51 Multiple regression from data or correlation matrices

The setCor function will take a set of y variables predicted from a set of x variablesperhaps with a set of z covariates removed from both x and y Consider the Thurstonecorrelation matrix and find the multiple correlation of the last five variables as a functionof the first 4

gt setCor(y = 59x=14data=Thurstone)

Call setCor(y = 59 x = 14 data = Thurstone)

Multiple Regression from matrix input

DV = FourLetterWords

slope VIF

Sentences 009 369

Vocabulary 009 388

SentCompletion 002 300

FirstLetters 058 135

Multiple Regression

R R2 Ruw R2uw

FourLetterWords 069 048 059 034

DV = Suffixes

slope VIF

Sentences 007 369

Vocabulary 017 388

SentCompletion 005 300

FirstLetters 045 135

Multiple Regression

R R2 Ruw R2uw

Suffixes 063 04 058 034

DV = LetterSeries

slope VIF

Sentences 025 369

Vocabulary 009 388

38

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 39: An introduction to the psych package: Part I: data entry ... · myData

SentCompletion 004 300

FirstLetters 021 135

Multiple Regression

R R2 Ruw R2uw

LetterSeries 05 025 049 024

DV = Pedigrees

slope VIF

Sentences 021 369

Vocabulary 016 388

SentCompletion 021 300

FirstLetters 008 135

Multiple Regression

R R2 Ruw R2uw

Pedigrees 058 034 058 033

DV = LetterGroup

slope VIF

Sentences 020 369

Vocabulary -002 388

SentCompletion 008 300

FirstLetters 031 135

Multiple Regression

R R2 Ruw R2uw

LetterGroup 048 023 045 02

Various estimates of between set correlations

Squared Canonical Correlations

[1] 06280 01478 00076 00049

Average squared canonical correlation = 02

Cohens Set Correlation R2 = 069

Unweighted correlation between the two sets = 073

By specifying the number of subjects in correlation matrix appropriate estimates of stan-dard errors t-values and probabilities are also found The next example finds the regres-sions with variables 1 and 2 used as covariates The β weights for variables 3 and 4 do notchange but the multiple correlation is much less It also shows how to find the residualcorrelations between variables 5-9 with variables 1-4 removed

gt sc lt- setCor(y = 59x=34data=Thurstonez=12)

Call setCor(y = 59 x = 34 data = Thurstone z = 12)

Multiple Regression from matrix input

The following variables were partialed out Sentences Vocabulary

and are included in the calculation of df1 and df2

DV = FourLetterWords

slope VIF

SentCompletion 002 102

FirstLetters 058 102

Multiple Regression

39

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 40: An introduction to the psych package: Part I: data entry ... · myData

R R2 Ruw R2uw

FourLetterWords 058 033 064 041

DV = Suffixes

slope VIF

SentCompletion 005 102

FirstLetters 045 102

Multiple Regression

R R2 Ruw R2uw

Suffixes 046 021 06 037

DV = LetterSeries

slope VIF

SentCompletion 004 102

FirstLetters 021 102

Multiple Regression

R R2 Ruw R2uw

LetterSeries 021 004 046 021

DV = Pedigrees

slope VIF

SentCompletion 021 102

FirstLetters 008 102

Multiple Regression

R R2 Ruw R2uw

Pedigrees 018 003 052 027

DV = LetterGroup

slope VIF

SentCompletion 008 102

FirstLetters 031 102

Multiple Regression

R R2 Ruw R2uw

LetterGroup 03 009 046 021

Various estimates of between set correlations

Squared Canonical Correlations

[1] 0405 0023

Average squared canonical correlation = 021

Cohens Set Correlation R2 = 042

Unweighted correlation between the two sets = 073

gt round(sc$residual2)

FourLetterWords Suffixes LetterSeries Pedigrees LetterGroup

FourLetterWords 053 012 011 008 014

Suffixes 012 061 001 003 004

LetterSeries 011 001 079 031 039

Pedigrees 008 003 031 070 023

LetterGroup 014 004 039 023 079

40

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 41: An introduction to the psych package: Part I: data entry ... · myData

52 Mediation and Moderation analysis

Although multiple regression is a straightforward method for determining the effect ofmultiple predictors (x12i) on a criterion variable y some prefer to think of the effect ofone predictor x as mediated by another variable m (Preacher and Hayes 2004) Thuswe we may find the indirect path from x to m and then from m to y as well as the directpath from x to y Call these paths a b and c respectively Then the indirect effect of xon y through m is just ab and the direct effect is c Statistical tests of the ab effect arebest done by bootstrapping This is discussed in detail in the ldquoHow To use mediate andsetCor to do mediation moderation and regression analysis tutorial

Consider the example from Preacher and Hayes (2004) as analyzed using the mediate

function and the subsequent graphic from mediatediagram The data are found in theexample for mediate

MediationModeration Analysis

Call mediate(y = SATIS ~ THERAPY + (ATTRIB) data = sobel)

The DV (Y) was SATIS The IV (X) was THERAPY The mediating variable(s) = ATTRIB

Total effect(c) of THERAPY on SATIS = 076 SE = 031 t = 25 df= 28 with p = 0019

Direct effect (c) of THERAPY on SATIS removing ATTRIB = 043 SE = 032 t = 135 df= 27 with p = 019

Indirect effect (ab) of THERAPY on SATIS through ATTRIB = 033

Mean bootstrapped indirect effect = 032 with standard error = 017 Lower CI = 003 Upper CI = 07

R = 056 R2 = 031 F = 606 on 2 and 27 DF p-value 000272

To see the longer output specify short = FALSE in the print statement or ask for the summary

bull setCor will take raw data or a correlation matrix and find (and graph the pathdiagram) for multiple y variables depending upon multiple x variables

setCor(SATV + SATQ ~ education + age data = satact std=TRUE)

bull mediate will take raw data or a correlation matrix and find (and graph the path dia-gram) for multiple y variables depending upon multiple x variables mediated througha mediation variable It then tests the mediation effect using a boot strap

mediate( SATV ~ education+ age + (ACT) data =satactstd=TRUEniter=50)

bull mediate will also take raw data and find (and graph the path diagram) a moderatedmultiple regression model for multiple y variables depending upon multiple x variablesmediated through a mediation variable It will form the product term either from themean centered data or from the raw data It then tests the mediation effect using aboot strap The data set is taken from Garcia et al (2010) The number of iterationsfor the boot strap was set to 50 for speed The default number of boot straps is5000 See the help page for the mediate function for more details For a much longerdiscussion of how to use the mediate function see the ldquoHowTordquo Using mediate andsetCor to do mediation moderation and regression analysis

41

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 42: An introduction to the psych package: Part I: data entry ... · myData

gt mediatediagram(preacher)

Mediation model

THERAPY SATIS

ATTRIB

c = 076

c = 043

082 04

Figure 15 A mediated model taken from Preacher and Hayes 2004 and solved using themediate function The direct path from Therapy to Satisfaction has a an effect of 76 whilethe indirect path through Attribution has an effect of 33 Compare this to the normalregression graphic created by setCordiagram

42

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 43: An introduction to the psych package: Part I: data entry ... · myData

gt preacher lt- setCor(SATIS ~ THERAPY + ATTRIBdata =sobelstd=FALSE)

gt setCordiagram(preacher)

Regression Models

THERAPY

ATTRIB

SATIS

043

04

021

Figure 16 The conventional regression model for the Preacher and Hayes 2004 data setsolved using the sector function Compare this to the previous figure

43

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 44: An introduction to the psych package: Part I: data entry ... · myData

MediationModeration Analysis

Call mediate(y = respappr ~ prot2 sexism + (sexism) data = Garcia

niter = 50 main = Moderated mediation (mean centered))

The DV (Y) was respappr The IV (X) was prot2 prot2sexism The mediating variable(s) = sexism

Total effect(c) of prot2 on respappr = 146 SE = 022 t = 677 df= 126 with p = 44e-10

Direct effect (c) of prot2 on respappr removing sexism = 146 SE = 022 t = 673 df= 125 with p = 55e-10

Indirect effect (ab) of prot2 on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 002 Lower CI = -004 Upper CI = 003

Total effect(c) of prot2sexism on respappr = 081 SE = 028 t = 289 df= 126 with p = 00045

Direct effect (c) of prot2sexism on respappr removing sexism = 081 SE = 028 t = 287 df= 125 with p = 00048

Indirect effect (ab) of prot2sexism on respappr through sexism = 0

Mean bootstrapped indirect effect = 0 with standard error = 003 Lower CI = -007 Upper CI = 007

R = 054 R2 = 03 F = 1753 on 3 and 125 DF p-value 191e-11

To see the longer output specify short = FALSE in the print statement or ask for the summary

Moderated mediation (mean centered)

minus001

prot2

prot2sexism

respapprsexism

c = 146

c = 146

c = 081

c = 081

007

009

002

Figure 17 Moderated multiple regression requires the raw data By default the data aremean centered before find the product term

44

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 45: An introduction to the psych package: Part I: data entry ... · myData

53 Set Correlation

An important generalization of multiple regression and multiple correlation is set correla-tion developed by Cohen (1982) and discussed by Cohen et al (2003) Set correlation isa multivariate generalization of multiple regression and estimates the amount of varianceshared between two sets of variables Set correlation also allows for examining the relation-ship between two sets when controlling for a third set This is implemented in the setCor

function Set correlation is

R2 = 1minusn

prodi=1

(1minusλi)

where λi is the ith eigen value of the eigen value decomposition of the matrix

R = Rminus1xx RxyRminus1

xx Rminus1xy

Unfortunately there are several cases where set correlation will give results that are muchtoo high This will happen if some variables from the first set are highly related to thosein the second set even though most are not In this case although the set correlationcan be very high the degree of relationship between the sets is not as high In thiscase an alternative statistic based upon the average canonical correlation might be moreappropriate

setCor has the additional feature that it will calculate multiple and partial correlationsfrom the correlation or covariance matrix rather than the original data

Consider the correlations of the 6 variables in the satact data set First do the normalmultiple regression and then compare it with the results using setCor Two things tonotice setCor works on the correlation or covariance or raw data matrix and thus ifusing the correlation matrix will report standardized or raw β weights Secondly it ispossible to do several multiple regressions simultaneously If the number of observationsis specified or if the analysis is done on raw data statistical tests of significance areapplied

For this example the analysis is done on the correlation matrix rather than the rawdata

gt C lt- cov(satactuse=pairwise)

gt model1 lt- lm(ACT~ gender + education + age data=satact)

gt summary(model1)

Call

lm(formula = ACT ~ gender + education + age data = satact)

Residuals

Min 1Q Median 3Q Max

-252458 -32133 07769 35921 92630

Coefficients

Estimate Std Error t value Pr(gt|t|)

45

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 46: An introduction to the psych package: Part I: data entry ... · myData

(Intercept) 2741706 082140 33378 lt 2e-16

gender -048606 037984 -1280 020110

education 047890 015235 3143 000174

age 001623 002278 0712 047650

---

Signif codes 0 0001 001 005 01 1

Residual standard error 4768 on 696 degrees of freedom

Multiple R-squared 00272 Adjusted R-squared 002301

F-statistic 6487 on 3 and 696 DF p-value 00002476

Compare this with the output from setCor

gt compare with sector

gt setCor(c(46)c(13)C nobs=700)

Call setCor(y = c(46) x = c(13) data = C nobs = 700)

Multiple Regression from raw data

DV = ACT

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -028 028 100

gender -026 008 -317 0087 -061 009 155

education 056 007 772 0016 025 087 122

age -064 008 -828 0014 -097 -031 138

Residual Standard Error = 015 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

ACT 1 099 098 096 098 0 7634 3 2 0013

DV = SATV

slope se t p lowerci upperci VIF

(Intercept) 000 007 000 1000 -031 031 100

gender 010 009 111 0380 -028 048 155

education 072 008 921 0012 039 106 122

age -082 008 -976 0010 -118 -046 138

Residual Standard Error = 016 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATV 099 099 089 079 097 0 6518 3 2 00151

DV = SATQ

slope se t p lowerci upperci VIF

(Intercept) 000 004 000 1000 -019 019 100

gender -052 005 -973 0010 -076 -029 155

education 040 005 832 0014 019 060 122

age -047 005 -924 0012 -069 -025 138

Residual Standard Error = 01 with 2 degrees of freedom

Multiple Regression

R R2 Ruw R2uw Shrunken R2 SE of R2 overall F df1 df2 p

SATQ 1 1 1 099 099 0 17711 3 2 000562

46

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 47: An introduction to the psych package: Part I: data entry ... · myData

Various estimates of between set correlations

Squared Canonical Correlations

[1] 1000 0988 0013

Chisq of canonical correlations

[1] 32517 4433 0013

Average squared canonical correlation = 067

Cohens Set Correlation R2 = 1

Shrunken Set Correlation R2 = 1

F and df of Cohens Set Correlation -Inf 12 -1294

Unweighted correlation between the two sets = 098

Note that the setCor analysis also reports the amount of shared variance between thepredictor set and the criterion (dependent) set This set correlation is symmetric That isthe R2 is the same independent of the direction of the relationship

6 Converting output to APA style tables using LATEX

Although for most purposes using the Sweave or KnitR packages produces clean outputsome prefer output pre formatted for APA style tables This can be done using the xtablepackage for almost anything but there are a few simple functions in psych for the mostcommon tables fa2latex will convert a factor analysis or components analysis output toa LATEXtable cor2latex will take a correlation matrix and show the lower (or upper diag-onal) irt2latex converts the item statistics from the irtfa function to more convenientLATEXoutput and finally df2latex converts a generic data frame to LATEX

An example of converting the output from fa to LATEXappears in Table 2

47

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 48: An introduction to the psych package: Part I: data entry ... · myData

Table 2 fa2latexA factor analysis table from the psych package in R

Variable MR1 MR2 MR3 h2 u2 com

Sentences 091 -004 004 082 018 101Vocabulary 089 006 -003 084 016 101SentCompletion 083 004 000 073 027 100FirstLetters 000 086 000 073 027 1004LetterWords -001 074 010 063 037 104Suffixes 018 063 -008 050 050 120LetterSeries 003 -001 084 072 028 100Pedigrees 037 -005 047 050 050 193LetterGroup -006 021 064 053 047 123

SS loadings 264 186 15

MR1 100 059 054MR2 059 100 052MR3 054 052 100

7 Miscellaneous functions

A number of functions have been developed for some very specific problems that donrsquot fitinto any other category The following is an incomplete list Look at the Index for psychfor a list of all of the functions

blockrandom Creates a block randomized structure for n independent variables Usefulfor teaching block randomization for experimental design

df2latex is useful for taking tabular output (such as a correlation matrix or that of de-

scribe and converting it to a LATEX table May be used when Sweave is not conve-nient

cor2latex Will format a correlation matrix in APA style in a LATEX table See alsofa2latex and irt2latex

cosinor One of several functions for doing circular statistics This is important whenstudying mood effects over the day which show a diurnal pattern See also circa-

dianmean circadiancor and circadianlinearcor for finding circular meanscircular correlations and correlations of circular with linear data

fisherz Convert a correlation to the corresponding Fisher z score

48

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 49: An introduction to the psych package: Part I: data entry ... · myData

geometricmean also harmonicmean find the appropriate mean for working with differentkinds of data

ICC and cohenkappa are typically used to find the reliability for raters

headtail combines the head and tail functions to show the first and last lines of a dataset or output

topBottom Same as headtail Combines the head and tail functions to show the first andlast lines of a data set or output but does not add ellipsis between

mardia calculates univariate or multivariate (Mardiarsquos test) skew and kurtosis for a vectormatrix or dataframe

prep finds the probability of replication for an F t or r and estimate effect size

partialr partials a y set of variables out of an x set and finds the resulting partialcorrelations (See also setcor)

rangeCorrection will correct correlations for restriction of range

reversecode will reverse code specified items Done more conveniently in most psychfunctions but supplied here as a helper function when using other packages

superMatrix Takes two or more matrices eg A and B and combines them into a ldquoSupermatrixrdquo with A on the top left B on the lower right and 0s for the other twoquadrants A useful trick when forming complex keys or when forming exampleproblems

8 Data sets

A number of data sets for demonstrating psychometric techniques are included in the psychpackage These include six data sets showing a hierarchical factor structure (five cogni-tive examples Thurstone Thurstone33 Holzinger Bechtoldt1 Bechtoldt2 andone from health psychology Reise) One of these (Thurstone) is used as an example inthe sem package as well as McDonald (1999) The original data are from Thurstone andThurstone (1941) and reanalyzed by Bechtoldt (1961) Personality item data representingfive personality factors on 25 items (bfi) 135 items for 4000 participants (spi) or 13 per-sonality inventory scores (epibfi) and 16 multiple choice iq items (iqitems ability)The vegetables example has paired comparison preferences for 9 vegetables This is anexample of Thurstonian scaling used by Guilford (1954) and Nunnally (1967) Other datasets include cubits peas and heights from Galton

Thurstone Holzinger-Swineford (1937) introduced the bifactor model of a general factorand uncorrelated group factors The Holzinger correlation matrix is a 14 14 matrix

49

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 50: An introduction to the psych package: Part I: data entry ... · myData

from their paper The Thurstone correlation matrix is a 9 9 matrix of correlationsof ability items The Reise data set is 16 16 correlation matrix of mental healthitems The Bechtholdt data sets are both 17 x 17 correlation matrices of ability tests

bfi 25 personality self report items taken from the International Personality Item Pool(ipiporiorg) were included as part of the Synthetic Aperture Personality Assessment(SAPA) web based personality assessment project The data from 2800 subjects areincluded here as a demonstration set for scale construction factor analysis and ItemResponse Theory analyses

spi 135 personality items and 10 demographic items for 4000 subjects are taken from theSynthetic Aperture Personality Assessment (SAPA) web based personality assessmentproject Revelle et al (2016) These 135 items form part of the SAPA PersonalityInventory

satact Self reported scores on the SAT Verbal SAT Quantitative and ACT were col-lected as part of the Synthetic Aperture Personality Assessment (SAPA) web basedpersonality assessment project Age gender and education are also reported Thedata from 700 subjects are included here as a demonstration set for correlation andanalysis

epibfi A small data set of 5 scales from the Eysenck Personality Inventory 5 from a Big 5inventory a Beck Depression Inventory and State and Trait Anxiety measures Usedfor demonstrations of correlations regressions graphic displays

iqitems 16 multiple choice ability items were included as part of the Synthetic AperturePersonality Assessment (SAPA) web based personality assessment project The datafrom 1525 subjects are included here as a demonstration set for scoring multiplechoice inventories and doing basic item statistics

ability The same 16 items converted to 01 scores are used for examples of various IRTprocedures These data are from the International Cognitive Ability Resource (ICAR)Condon amp Revelle (2014) and were collected as part of the SAPA web based assess-ment httpssapa-projectorg project Revelle et al (2016)

galton Two of the earliest examples of the correlation coefficient were Francis Galtonrsquosdata sets on the relationship between mid parent and child height and the similarity ofparent generation peas with child peas galton is the data set for the Galton heightpeas is the data set Francis Galton used to introduce the correlation coefficient withan analysis of the similarities of the parent and child generation of 700 sweet peas

Dwyer Dwyer (1937) introduced a method for factor extension (see faextension thatfinds loadings on factors from an original data set for additional (extended) variablesThis data set includes his example

50

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 51: An introduction to the psych package: Part I: data entry ... · myData

miscellaneous cities is a matrix of airline distances between 11 US cities and maybe used for demonstrating multiple dimensional scaling vegetables is a classicdata set for demonstrating Thurstonian scaling and is the preference matrix of 9vegetables from Guilford (1954) Used by Guilford (1954) Nunnally (1967) Nunnallyand Bernstein (1984) this data set allows for examples of basic scaling techniques

9 Development version and a users guide

The most recent development version is available as a source file at the repository main-tained at httpspersonality-projectorgr That version will have removed themost recently discovered bugs (but perhaps introduced other yet to be discovered ones)To download that version go to the repository httppersonality-projectorgr

srccontrib and wander around For both Macs and PC this version can be installeddirectly using the ldquoother repositoryrdquo option in the package installer Make sure to specifytype=rdquosourcerdquo

gt installpackages(psych repos=httpspersonality-projectorgr type=source)

Although the individual help pages for the psych package are available as part of R andmay be accessed directly (eg psych) the full manual for the psych package is alsoavailable as a pdf at httpspersonality-projectorgrpsych_manualpdf

News and a history of changes are available in the NEWS and CHANGES files in the sourcefiles To view the most recent news

gt news(Version gt= 184package=psych)

10 Psychometric Theory

The psych package has been developed to help psychologists do basic research Many of thefunctions were developed to supplement a book (httpspersonality-projectorgrbook An introduction to Psychometric Theory with Applications in R (Revelle prep) Moreinformation about the use of some of the functions may be found in the book

For more extensive discussion of the use of psych in particular and R in general consulthttpspersonality-projectorgrrguidehtml A short guide to R

11 SessionInfo

This document was prepared using the following settings

gt sessionInfo()

R version 402 (2020-06-22)

Platform x86_64-apple-darwin170 (64-bit)

51

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 52: An introduction to the psych package: Part I: data entry ... · myData

Running under macOS Catalina 10156

Matrix products default

BLAS LibraryFrameworksRframeworkVersions40ResourcesliblibRblasdylib

LAPACK LibraryFrameworksRframeworkVersions40ResourcesliblibRlapackdylib

locale

[1] C

attached base packages

[1] stats graphics grDevices utils datasets methods base

other attached packages

[1] psychTools_209 psych_209

loaded via a namespace (and not attached)

[1] compiler_402 parallel_402 tools_402 foreign_08-80 tmvnsim_10-2 nlme_31-148

[7] mnormt_201 grid_402 lattice_020-41

52

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 53: An introduction to the psych package: Part I: data entry ... · myData

References

Bechtoldt H (1961) An empirical study of the factor analysis stability hypothesis Psy-chometrika 26(4)405ndash432

Blashfield R K (1980) The growth of cluster analysis Tryon Ward and JohnsonMultivariate Behavioral Research 15(4)439 ndash 458

Blashfield R K and Aldenderfer M S (1988) The methods and problems of clusteranalysis In Nesselroade J R and Cattell R B editors Handbook of multivariateexperimental psychology (2nd ed) pages 447ndash473 Plenum Press New York NY

Bliese P D (2009) Multilevel modeling in r (23) a brief introduction to r the multilevelpackage and the nlme package

Cattell R B (1966) The scree test for the number of factors Multivariate BehavioralResearch 1(2)245ndash276

Cattell R B (1978) The scientific use of factor analysis Plenum Press New York

Cohen J (1982) Set correlation as a general multivariate data-analytic method Multi-variate Behavioral Research 17(3)

Cohen J Cohen P West S G and Aiken L S (2003) Applied multiple regres-sioncorrelation analysis for the behavioral sciences L Erlbaum Associates MahwahNJ 3rd ed edition

Condon D M amp Revelle W (2014) The International Cognitive Ability Resource De-velopment and initial validation of a public-domain measure Intelligence 43 52ndash64

Cooksey R and Soutar G (2006) Coefficient beta and hierarchical item clustering - ananalytical procedure for establishing and displaying the dimensionality and homogeneityof summated scales Organizational Research Methods 978ndash98

Cronbach L J (1951) Coefficient alpha and the internal structure of tests Psychometrika16297ndash334

Dwyer P S (1937) The determination of the factor loadings of a given test from theknown factor loadings of other tests Psychometrika 2(3)173ndash178

Everitt B (1974) Cluster analysis John Wiley amp Sons Cluster analysis 122 pp OxfordEngland

Fox J Nie Z and Byrnes J (2012) sem Structural Equation Models

Garcia D M Schmitt M T Branscombe N R and Ellemers N (2010) Womenrsquosreactions to ingroup members who protest discriminatory treatment The importance

53

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 54: An introduction to the psych package: Part I: data entry ... · myData

of beliefs about inequality and response appropriateness European Journal of SocialPsychology 40(5)733ndash745

Grice J W (2001) Computing and evaluating factor scores Psychological Methods6(4)430ndash450

Guilford J P (1954) Psychometric Methods McGraw-Hill New York 2nd edition

Guttman L (1945) A basis for analyzing test-retest reliability Psychometrika 10(4)255ndash282

Hartigan J A (1975) Clustering Algorithms John Wiley amp Sons Inc New York NYUSA

Hayes A F (2013) Introduction to mediation moderation and conditional process anal-ysis A regression-based approach Guilford Press New York

Henry D B Tolan P H and Gorman-Smith D (2005) Cluster analysis in familypsychology research Journal of Family Psychology 19(1)121ndash132

Holm S (1979) A simple sequentially rejective multiple test procedure ScandinavianJournal of Statistics 6(2)pp 65ndash70

Holzinger K and Swineford F (1937) The bi-factor method Psychometrika 2(1)41ndash54

Horn J (1965) A rationale and test for the number of factors in factor analysis Psy-chometrika 30(2)179ndash185

Horn J L and Engstrom R (1979) Cattellrsquos scree test in relation to bartlettrsquos chi-squaretest and other observations on the number of factors problem Multivariate BehavioralResearch 14(3)283ndash300

Jennrich R and Bentler P (2011) Exploratory bi-factor analysis Psychometrika pages1ndash13 101007s11336-011-9218-4

Jensen A R and Weng L-J (1994) What is a good g Intelligence 18(3)231ndash258

Loevinger J Gleser G and DuBois P (1953) Maximizing the discriminating power ofa multiple-score test Psychometrika 18(4)309ndash317

MacCallum R C Browne M W and Cai L (2007) Factor analysis models as ap-proximations In Cudeck R and MacCallum R C editors Factor analysis at 100Historical developments and future directions pages 153ndash175 Lawrence Erlbaum Asso-ciates Publishers Mahwah NJ

Martinent G and Ferrand C (2007) A cluster analysis of precompetitive anxiety Re-lationship with perfectionism and trait anxiety Personality and Individual Differences43(7)1676ndash1686

54

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 55: An introduction to the psych package: Part I: data entry ... · myData

McDonald R P (1999) Test theory A unified treatment L Erlbaum Associates MahwahNJ

Mun E Y von Eye A Bates M E and Vaschillo E G (2008) Finding groupsusing model-based cluster analysis Heterogeneous emotional self-regulatory processesand heavy alcohol use risk Developmental Psychology 44(2)481ndash495

Nunnally J C (1967) Psychometric theory McGraw-Hill New York

Nunnally J C and Bernstein I H (1984) Psychometric theory McGraw-Hill New Yorkrdquo

3rd edition

Pedhazur E (1997) Multiple regression in behavioral research explanation and predictionHarcourt Brace College Publishers

Preacher K J and Hayes A F (2004) SPSS and SAS procedures for estimating in-direct effects in simple mediation models Behavior Research Methods Instruments ampComputers 36(4)717ndash731

Revelle W (1979) Hierarchical cluster-analysis and the internal structure of tests Mul-tivariate Behavioral Research 14(1)57ndash74

Revelle W (2018) psych Procedures for Personality and Psychological Research North-western University Evanston R package version 186

Revelle W (in prep) An introduction to psychometric theory with applications in RSpringer

Revelle W and Condon D M (2014) Reliability In Irwing P Booth T and HughesD editors Wiley-Blackwell Handbook of Psychometric Testing Wiley-Blackwell (inpress)

Revelle W Condon D and Wilt J (2011) Methodological advances in differentialpsychology In Chamorro-Premuzic T Furnham A and von Stumm S editorsHandbook of Individual Differences chapter 2 pages 39ndash73 Wiley-Blackwell

Revelle W Condon D M Wilt J French J A Brown A amp Elleman L G (2016)Web and phone based data collection using planned missing designs In N G FieldingR M Lee amp G Blank (Eds) SAGE Handbook of Online Research Methods (2nd ed)chapter 37 (pp 578ndash595) Sage Publications Inc

Revelle W and Rocklin T (1979) Very Simple Structure - alternative procedure forestimating the optimal number of interpretable factors Multivariate Behavioral Research14(4)403ndash414

Revelle W Wilt J and Rosenthal A (2010) Personality and cognition The personality-cognition link In Gruszka A Matthews G and Szymura B editors Handbook of

55

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 56: An introduction to the psych package: Part I: data entry ... · myData

Individual Differences in Cognition Attention Memory and Executive Control chap-ter 2 pages 27ndash49 Springer

Revelle W and Zinbarg R E (2009) Coefficients alpha beta omega and the glbcomments on Sijtsma Psychometrika 74(1)145ndash154

Schmid J J and Leiman J M (1957) The development of hierarchical factor solutionsPsychometrika 22(1)83ndash90

Shrout P E and Fleiss J L (1979) Intraclass correlations Uses in assessing raterreliability Psychological Bulletin 86(2)420ndash428

Smillie L D Cooper A Wilt J and Revelle W (2012) Do extraverts get more bangfor the buck refining the affective-reactivity hypothesis of extraversion Journal ofPersonality and Social Psychology 103(2)306ndash326

Sneath P H A and Sokal R R (1973) Numerical taxonomy the principles and practiceof numerical classification A Series of books in biology W H Freeman San Francisco

Sokal R R and Sneath P H A (1963) Principles of numerical taxonomy A Series ofbooks in biology W H Freeman San Francisco

Spearman C (1904) The proof and measurement of association between two things TheAmerican Journal of Psychology 15(1)72ndash101

Steiger J H (1980) Tests for comparing elements of a correlation matrix PsychologicalBulletin 87(2)245ndash251

Tal-Or N Cohen J Tsfati Y and Gunther A C (2010) Testing causal direction inthe influence of presumed media influence Communication Research 37(6)801ndash824

Thorburn W M (1918) The myth of occamrsquos razor Mind 27345ndash353

Thurstone L L and Thurstone T G (1941) Factorial studies of intelligence TheUniversity of Chicago press Chicago Ill

Tryon R C (1935) A theory of psychological componentsndashan alternative to rdquomathematicalfactorsrdquo Psychological Review 42(5)425ndash454

Tryon R C (1939) Cluster analysis Edwards Brothers Ann Arbor Michigan

Velicer W (1976) Determining the number of components from the matrix of partialcorrelations Psychometrika 41(3)321ndash327

Zinbarg R E Revelle W Yovel I and Li W (2005) Cronbachrsquos α Revellersquos β andMcDonaldrsquos ωH) Their relations with each other and two alternative conceptualizationsof reliability Psychometrika 70(1)123ndash133

56

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 57: An introduction to the psych package: Part I: data entry ... · myData

Zinbarg R E Yovel I Revelle W and McDonald R P (2006) Estimating gener-alizability to a latent variable common to all of a scalersquos indicators A comparison ofestimators for ωh Applied Psychological Measurement 30(2)121ndash144

57

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 58: An introduction to the psych package: Part I: data entry ... · myData

Index

ability 49affect 15 25alpha 5 7alpha factoring 7

Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27bifactor 7biserial 13 36blockrandom 48burt 36

char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48circular statistics 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48ctv 9cubits 49

densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48diagram 9

drawcor 35drawtetra 35dummycode 13dynamite plot 20

edit 3epibfi 49error bars 20errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24

fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factor analysis 7factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48

galton 50generalized least squares 7geometricmean 49GPArotation 9guttman 7

harmonicmean 49head 49headtail 49heights 49

58

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 59: An introduction to the psych package: Part I: data entry ... · myData

hetdiagram 8Hmisc 29Holzinger 49

ICC 7 49iclust 7iclustdiagram 8Index 48introduction to psychometric theory with ap-

plications in R 8iqitems 49irtfa 7 47irt2latex 47 48

KnitR 47

lavaan 37library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28lowess 15

makekeys 15MAP 7mardia 49maximum likelihood 7mediate 4 8 41 42mediatediagram 41minimum residual 7mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevel 36 37multilevelreliability 7multiple regression 38

nfactors 7

nlme 36 37

omega 7 9outlier 3 12 14

padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9principal axis 7psych 3 5ndash9 29 47ndash49 51psychTools 3

R functionability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49cor 28corsmooth 36cortest 29cor2latex 47 48

59

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 60: An introduction to the psych package: Part I: data entry ... · myData

corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13edit 3epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8filechoose 10fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49head 49headtail 49heights 49

hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48library 9lm 38lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9outlier 3 12 14padjust 29prep 49pairs 15pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51

60

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 61: An introduction to the psych package: Part I: data entry ... · myData

psych packageability 49affect 15alpha 5 7Bechtoldt1 49Bechtoldt2 49bfi 26 49bibars 7 26 27biserial 13 36blockrandom 48burt 36char2numeric 13circadiancor 48circadianlinearcor 48circadianmean 48cities 51cohenkappa 49corsmooth 36cor2latex 47 48corPlot 8corrp 29 33corrtest 29 33cortest 34cosinor 48cubits 49densityBy 15 26densitynBy 27describe 7 11 48describeBy 3 7 11 12df2latex 47 48drawcor 35drawtetra 35dummycode 13epibfi 49errorbars 7 15 20errorbarsby 12 15 20 21errorbarstab 20errorcrosses 20errorCircles 25errorCrosses 24fa 7 8 47

fadiagram 8faextension 50famulti 7faparallel 5 7fa2latex 47 48faBy 37factorminres 8factorpa 8factorwls 8fisherz 48galton 50geometricmean 49guttman 7harmonicmean 49headtail 49heights 49hetdiagram 8Holzinger 49ICC 7 49iclust 7iclustdiagram 8iqitems 49irtfa 7 47irt2latex 47 48lowerCor 4 28lowerMat 28lowerUpper 28makekeys 15MAP 7mardia 49mediate 4 8 41 42mediatediagram 41mixedcor 36mlArrange 7mlPlot 7mlr 7msq 15multihist 7multilevelreliability 7nfactors 7omega 7 9

61

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 62: An introduction to the psych package: Part I: data entry ... · myData

outlier 3 12 14prep 49pairspanels 3 7 8 14ndash18partialr 49pca 7peas 49 50plotirt 8plotpoly 8polychoric 7 35polyserial 36principal 5 7 9psych 51rtest 29rangeCorrection 49readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8

superMatrix 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49

62

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo
Page 63: An introduction to the psych package: Part I: data entry ... · myData

table 20tail 49tetrachoric 7 35Thurstone 29 49Thurstone33 49topBottom 49vegetables 49 51violinBy 15 18 19vss 5 7withinBetween 37

R packagectv 9GPArotation 9Hmisc 29KnitR 47lavaan 37multilevel 36 37nlme 36 37psych 3 5ndash9 29 47ndash49 51psychTools 3Rgraphviz 9sem 8 49stats 29Sweave 47xtable 47

rtest 29rangeCorrection 49rcorr 29readclipboard 3 7 9 10readclipboardcsv 10readclipboardfwf 11readclipboardlower 10readclipboardtab 3 10 11readclipboardupper 11readfile 3 7 10readtable 10Reise 49reversecode 49Rgraphviz 9

SAPA 26 50

satact 11 34 45scatterhist 7schmid 7 9scoremultiplechoice 7scoreItems 5 7 8 15scrub 3 13sector 43sem 8 49set correlation 45setcor 49setCor 4 8 38 41 45ndash47simmultilevel 37spi 49spider 13stars 15stats 29StatsBy 7statsBy 7 36 37statsByboot 37statsBybootsummary 37structurediagram 8superMatrix 49Sweave 47

table 20tail 49tetrachoric 7 35Thurstone 29 38 49Thurstone33 49topBottom 49

vegetables 49 51violinBy 15 18 19vss 5 7

weighted least squares 7withinBetween 37

xtable 47

63

  • Jump starting the psych packagendasha guide for the impatient
  • Psychometric functions are summarized in the second vignette
  • Overview of this and related documents
  • Getting started
  • Basic data analysis
    • Getting the data by using readfile
    • Data input from the clipboard
    • Basic descriptive statistics
      • Outlier detection using outlier
      • Basic data cleaning using scrub
      • Recoding categorical variables into dummy coded variables
        • Simple descriptive graphics
          • Scatter Plot Matrices
          • Density or violin plots
          • Means and error bars
          • Error bars for tabular data
          • Two dimensional displays of means and errors
          • Back to back histograms
          • Correlational structure
          • Heatmap displays of correlational structure
            • Testing correlations
            • Polychoric tetrachoric polyserial and biserial correlations
              • Multilevel modeling
                • Decomposing data into within and between level correlations using statsBy
                • Generating and displaying multilevel data
                • Factor analysis by groups
                  • Multiple Regression mediation moderation and set correlations
                    • Multiple regression from data or correlation matrices
                    • Mediation and Moderation analysis
                    • Set Correlation
                      • Converting output to APA style tables using LaTeX
                      • Miscellaneous functions
                      • Data sets
                      • Development version and a users guide
                      • Psychometric Theory
                      • SessionInfo

Top Related