The New Empiricism: Big Data, Network Science, and
Psychological Inquiry
Kevin Lanning, Fla Atl USeth Stephens-Davidowitz, Google
Dustin Wood, Wake Forest UTal Yarkoni, U Texas
Beyond keywords: Network analyses of psychological
science
Kevin Lanning, Ryne Sherman, Xingquan Zhu, Jared Hesse, & Daniel LopezFlorida Atlantic University
Introduction
Science as a social endeavorNetworks, citations, and meaning
Four levels of analysis
Level of analysis Concept or parameter InterpretationNetwork Giant component Overall connectednessCommunity Modularity Topics, subdisciplines, cliques, categoriesPath Diameter, path length Distance and proximity of papers, scholarsAuthor/paper In-degree, PageRank, centrality Mechanisms of influence, impact, eminence
Datasets
Annual Review articles on personality• Author as the unit of analysis (Smith,J)• 33 source papers published 1977-2012
Journal of Social Issues and Analyses of Social Issues and Public Policy (journals of SPSSI/APA Division 9• 855 source papers published 2001-2013
• By citation (Smith,J 2006)• By author (Smith,J)
Analyses of first authors only.
Big data, small world:Personality in the Annual Review
Scope• 6,294 references by 2,803
unique authors• Of these, 219 self-cites
(3.5%) are excluded
Connectedness• All authors are connected,
and separated by no more than 5 degrees
• Average path is 3 Nodes and text are colored by community. Node size represents Eigenvector centrality.
Layout determined by Force Atlas 2 algorithm.
All analyses and visualizations done in Gephi.
Eminence and network centrality: 3 interpretations• Simple measures (counts)• In-degree (ID) = cites by different sources• Weighted in-degree = total cites
• Recursive measures• PageRank (PR) and Eigenvector Centrality
• Importance of a paper is dependent upon the importance of papers which refer to it
• A random walk through the literature
• Betweenness Centrality (BC)• Tying together regions of scholarship
Personality in the Annual Review: Most cited authors
115 authors with 5 or more cites
Proximity and distance in citation networks• Proximity may occur for several reasons;
distance is less ambiguous• Closeness of Block and Mischel in the personality
space (right)
• Greatest distances among source papers• Parke (‘83, Social and Personality Development) • -> Rorer (‘83, Personality Structure and
Assessment) • -> Butcher (‘96, Personality: Individual
Differences and Clinical Assessment).
Personality in the Annual review: Communities as constellations / The five factor paradigm
Between 12 and 15 communities are identified. One of the largest is anchored by source papers of Wiggins, Carson, Digman, and Ozer.
Personality in the Annual review: Minnesota and Berkeley schools
In one analysis, the two largest communities
Personality in the Annual review: Some thoughts on Harrison Gough (1921-2014)
The direct and indirect influenceCited in 5 of the Annual Review papers in the dataset (1 degree), which link to 597 others (22% of scholars by 1-2 degrees), and to 2329 (83%) by 3 degrees.
The continuing legacy“Big Data” approaches can be seen as an extension of the empirical tradition of Binet, Meehl, and Gough.
Analyses of the SPSSI journal database
• All papers published in JSI, ASAP from 2001-2013. • N sources = 855• 38854 references(45.4 per source) • - 2,042 self-references (5.3%)• - 3,198 (8.2%) unusable: references to news articles, government institutes,
or without a date____________________________
• 33,615 usable citations (86.5%)• 24,263 unique papers• 14,702 unique first authors
SPSSI citation network: Connectedness
• Of the 24,263 papers, 24,075 (99.2%) are linked in a single giant component• Papers are separated by an
average of 4.2 links
PageRank is high for papers with commentary• King (2011) • Second highest PR in database
• Explanation• Papers which are cited by papers with few
references (such as commentaries) can have a disproportionate impact in a sparse network
• Two solutions• Omit commentaries and book reviews • Treat authors rather than papers as the unit of analysis
• Limitations of citation networks: sparseness, time-constraint
The SPSSI author network: (almost) no one is an island• 14,703 unique authors• All but 6 are linked to the main • Average path between nodes =
5.1
• 32-38 communities*• Average author is linked
1.9 times
Whole network
The SPSSI author
network:Most cited
Includes 68 authors with 20 or more
citations. Nodes ranked by eigenvector
centrality
The SPSSI author network: Centrality
• Content of rankings• Betweenness vs. other
measures• On gender effects in citation
networks
The SPSSI author network: Allport and Lewin communities compared
Lewin community includes authors with 5 or more cites; Allport includes authors with 13+ cites. Nodes ranked by eigenvector centrality
Summary• Content: Influential persons and scholarly works• Different measures of centrality have distinct interpretations• Beyond this, eminence of communities as well as persons
• Citation networks are small worlds• The discrete clustering approach to describing the network is not
ideal• In citation networks, distance may be more interpretable than
proximity• The work is primitive• Bigger data and more sophisticated methods lie ahead