An approach for evaluating academic research performance using betweenness centrality
in authors networks
Keisuke Honda, Research Organization of Information and Systems Yuji Mizukami SOTA SYSTEMS CO., LTD.
Shigenori Jason Suzuki SOTA SYSTEMS CO., LTD. Junji Nakano The Institute of Statistical Mathematics
Agenda• Motivation and Background
• Inter-University Research Institute Corp.
• Development new evaluating index using science database
• Centrality in graph analysis
• Betweenness centrality in authors networks
• IR (Institutional Research) tool
• Cloud environment for big data analysis
Overview of ISM• Objective: research statistical theory, methodologies real-world problems
• Mission: promote development of human resources decision making based on data
• History: Founded 1944 Inter-University Research Institute 1985 Independent Agency, Research Organization 2004
Inter-University Research Institute Corp.
Research Organization of Information and Systems
National Institute of Genetics
National Institute of Polar Research
National Institute of Informatics
The Institute of Statistical Mathematics
Inter-University Research Institute Corp.’s
Research Organization of Information and Systems
High-Energy Accelerator Research Organization
National Institutes for Humanities
National Institutes of Natural Sciences
Universities/College
National Universities
Private Universities
!!
Junior College
Research Collaboration
National Museum of Japanese HistoryNational Institute of Japanese LiteratureInt'l Research Center for Japanese StudiesNational Museum of Ethnology
Accelerator LaboratoryInstitute of Particle and Nuclear StudiesInstitute of Materials Structure Science
National Astronomical Observatory of JapanNational Institute for Fusion ScienceInstitute for Molecular ScienceNational Institute for Basic BiologyNational Institute for Physiological Sciences
Research Org. of Information and SystemsThe Institute of Statistical MathematicsNational Institute InformaticsNational Institute of GeneticsNational Institute of Polar Research
National Institutes for Humanities
National Institutes of Natural Sciences
High-Energy Accelerator Research Org.
Dept. of Structural Molecular Science
Dept. of Functional Molecular Science
Dept. of Astronomical Science
Dept. of Fusion Science
Dept. of Space and Astronautical Science
Dept. of Statistical ScienceDept. of Polar ScienceDept. of Informatics
Dept. of Accelerator ScienceDept. of Materials Structure ScienceDept. of Particle and Nuclear Physics
School of High Eng. Accelerator Science
School of Advanced Sciences
School of Cultural and Social
School of Multidisciplinary Sciences
School of Life Science
School of Physical Sciences
Dept. of Dept. of Comparative StudiesDept. of Japanese StudiesDept. of Japanese History
Dept. of Dept. of Dept. of
Dept. of Evolutionary Studies of Biosystems
Inter-University Research Institute Corp.’s
Research Organization of Information and Systems
High-Energy Accelerator Research Organization
National Institutes for Humanities
National Institutes of Natural Sciences
Universities/College
National Universities
Private Universities
!!
Junior College
Research Collaboration
Development new evaluating index
• To understand the contribution of researchers
• potential capacity not only impact of research
• Our goal is to visualize research collaboration
• Big data analysis:
• data from scientific journal database
• utilize cloud environment for graph analysis
Design of index• represents contribution of research collaboration
• Analyzing the structure of co-authors network from science journal database
• Co-authors network:
• nodes = researchers
• edge = co-authors relationship
• Centrality: identify the most important nodes within a graph
Centrality in network• Centrality: Degree, Closeness, Betweenness (Freeman 1979)
• Betweenness centrality: number of shortest paths from all vertices to all others that pass through that node
Visualize research collaboration
Idea: regard the magnitude of the betweenness centrality in co-authors network as the strength of the contribution to the research collaboration
Sample Data database:Thomson Reuters Web of Science™ search condition: keywords[Inst Stat Math], category: [organization], timespan:[2013-2013] search result: total 135 papers(502authors) 51 authors (bc > 0, 10.16%)
Example
Result of analysis (1)
• Characteristics of high betweenness centrality
• many co-authors paper, not first author -> leadership position
• many co-authors in single paper -> large-scale research
• linked other high score researchers -> cross organization
rank No.1 No.2 No.3 No.4 No.5Score of B.C. 6014 3345 1800 633 517Num. of co-author paper 13 8 5 3 5Num. of with high ranker(*) 5 2 3 3 2Num. of co-authors in single paper 16 17 9 5 5degree 59 39 18 9 9
* top 30
Result of analysis (2)
year 2013 2012 2011 2010 2009
total of bc 14297 2272 318 1278 485
total node num. 502 427 345 365 276
bc > 0 51 34 22 38 27
Structural changes of the research activities
IR(Institutional Research) • is activities in the management layer of higher education
• has been introduced in Japan
• support
• planning
• decision
• policy making
• evaluating
As IR tool
Num. of Papers Quantitative analysis
Qualitative analysis
Betw. Centrality
Num. of Citation, Impact Fac.
collaboration impact
interdisciplinary, diversity, …
EIR (Exploratory Institutional Research)
ISM Supercomputer systems
Assimilation, Advanced
Cloud, Community
Intelligent, Investigate
Communual Cloud Computer System
Specifications
ServerDELL PowerEdge R620 x 64 Peak Performance 29TFLOPS Total Memory Capacity 16TB
Storage 364TB
Software
[Cloud Infrastructure] Apache Cloudstack 4.2 [Virtualization Infrastructure] KVM [Compiler] Intel Cluster Studio XE 2013 [Data Analysis] R(parallelized R library Rhpc) Hadoop Mahout, Spotfire, RapidMiner [Batch] TORQUE 4.2
concluding remarks• Development new evaluating index
• calculating B.C in co-authors network
• Future plan (platform of EIR environment)
• query interface
• Apache Spark GraphX
Evaluating Research performance• Number of Paper
• (Top10%)
!
• HITACHI HITAC M-280H
• Our first of computer which installed SAS at 1988
• Mainframe of HITACH
• Main Memory: 24 MB
History of ISM History of ISM1945195019551960196519701975198019851990199520002005201020152020
Independent Agency, Research Organization2004/04/01
Foundation of ISM1945/06/05
Inter-University Research Institute1985/04/01
Foundation of Akaike Guest House 2010/05/29
Moved to Tachikawa Campus2009/10/01
Graduate Univ. for Advanced Study1988/10/01
History of the ISMHayashi's quantification methods
AIC (Akaike’s Information Criterion)
Particle Filter or Sequential Monte Carlo
Akaike Guest House
19 16 11Associate Prof. Assistant Prof.Professors
Research Support!Center for Eng. and Tech. Support!Library
Statistical Modeling
Data Science
Mathematical Anal. & Inference
Basic Research
NOE-type Research Professional Development
School of Statistical ThinkingStructure of Research System
Risk Analysis Research Center
Research and Development Center
for Data Assimilation
Survey Science Center
Research Center for Statistical M
achine Learning
Service Science Research Center
As of Jan. 19, 2012
�Books 61,368�
Foreign Books 73
45,834�
2,167�
Foreign Periodicals 46
������
Japanese books�
Japanese periodicals
27�� 16,980�
�Periodicals�54% 1,179
Library and Publications �Library Our library has a large collection of books and journal related to Statistical science, and provides several services using the Internet.�
�Journals 200 leading scientific journals related to statistics, mathematics, computational science and information science. ��Online Access �Online Public Access Catalog is available to search materials in ISM Library. �A great variety of electronic journals, electronic books and databases can be used.�
�Annals of the institute of Statistical Mathematics International journal edited by ISM and distributed by Springer �
Publications We publish academic journals and reports and provide publications on our website.�
�Proceedings of the Institute of Statistical Mathematics Biannual journal in Japanese with English summaries�
�Computer Science Monographs Online technical report on computer programs and software for statistical science��The ISM Research Report Technical reports on surveys mainly in Japanese��Research Memorandum Prompt academic technical reports�
�ISM Report on Research and Education Archives on workshops and lectures�
2013/8/2�
S.C. System for Data AssimilationSpecifications
Machine SGI UV2000 x2
Node Specifications
SGI Rackable C1104G-RP5 [CPU] Intel Xeon E5-2600 x2 [Coprocessors] Intel Xeon Phi 5110P
processorIntel Xeon E5-4600 2.4GHz 10core x256 socket (2560core x2)
memory 64TB (32GB DDR3-1866 x2048) x2
Storage SAS、SSD total 816TB
Intercornnect NUMALink
Software
[OS] SuSE Linux Enterprise Server 11 SP3 [Batch] PBS Professional 12.0.2 [Compiler] Intel Cluster Studio XE 2013
consists UV2000 x2 total spec: 5120core/128TB
Specifications
Machine SGI ICE X
Total Specifications
Number of Nodes 520 Nomuber of Core 12,960 Peak Performance 336TFLOPS Total Memory Capacity 100TB
Node Specifications
[CPU] Intel Xeon E5-2697v2 x2 [Coprocessors] Intel Xeon Phi 5110P
Storage SAS, SSD total 2.5PB
Software
[OS] SuSE Linux Enterprise Server 11 SP3 [Batch] PBS Professional 12.0.2 [Compiler] Intel Cluster Studio XE 2013
S.C. System for Statistical Science
Table of available VM Images
Name Number of max parallelization OS Applications
MPI Cloud 8 CentOS 6 Intel MPI (Intel Cluster Studio XE 2013)
Statistical analysis (1) 8 CentOS 6R (3.0.X) Hadoop (1.1.2-1) Mahout (0.8)
Statistical analysis (2) 8 CentOS 6Open MPI (1.6.5-8) R (3.0.X)
Statistical analysis (3) 4 (2 core) Ubuntu 14 Open MPI (1.6.5-8) R (3.0.X)
Minimal Cloud 1 CentOS 6 -
Matlab Cloud 8 CentOS 6 Matlab (R2014a)
Spotfire Cloud 1 (joint use) Windows Spotfire (5.5)
Challenge of Bigdata analysis• Cloud computing as HPC
• SAS Solutions for Hadoop
• Cray URIKA-XA
• Our resources
• human resources
• computational resources
• research collaborator in industry.
Other systems• Physical random number generator
• 4K-3D visualization system
Thank you !!
$14MBudget