Date post: | 28-Jul-2015 |
Category: |
Technology |
Upload: | shenghui-wang |
View: | 342 times |
Download: | 3 times |
Shenghui WangRob Koopman
Exploring a world of networked information built from free-text metadata
OCLC Research EMEA
ELAG2015
What would you do if you are interested in a topic?
Difficult to answer these questions: • What are the different aspects of this topic? • Are there related aspects missing in my search terms? • Who are the most prominent authors about this topic? • Which journals publish most about this topic? • How have others — e.g. librarians — described and classified
this topic?
How do we do this?
• OFFLINE: generates a semantic representation for each entity
• ONLINE: finds the most related entities and using multidimensional scaling to display
Build semantic representation
• Basic assumptions– Entities can be represented by its context– Entities which share more context are more likely
to be related• Context is the textual environment where an
entity occurs
• The effects of state prekindergarten programs on young children’s school readiness in five states
• [author:jung kwanghee]• [subject:readiness for school]
Dataset
● ArticleFirst, 65 million articles● Selected 4 million entities (topical terms,
authors, ISSNs, Dewey decimal codes)● Represented by 1 million topical terms
But a matrix of 4M x 1M is too big to process
Dimension reduction based on Random Projection
C: a co-occurrence matrix
R: a random matrix of +/-1
C’: approximation of C after random projection -- Semantic matrix
Online interface
• Find mutual nearest neighbors
• Use multidimensional scaling to display
Nearest neighbors
Mutual nearest neighbors
Possible applications
• Explorative interface• Context based search:
– brain
• Journal finder– Arctic ice journals– http://brain.oxfordjournals.org/
• Author name disambiguation– pre kindergarten
Context matters!
• What does “young” mean in- AritcleFirst- WorldCat- Astrophysics- Art
Ariadne(demo) http://thoth.pica.nl/relate
• An extremely fast way of navigating large scale hetereogeneous entities
• Generalisable to different datasets– Full WorldCat– Small but highly curated astrophysics dataset
• Supports explorative information retrieval and entity disambiguation
References• Koopman, Rob, and Shenghui Wang. 2014. “Where Should I Publish? Detecting
Journal Similarity Based on What Has Been Published There.” In Proceedings of Digital Libraries 2014, 483–484. London, United Kingdom. Association for Computing Machinery. Paper, Poster
• Koopman, Rob, Shenghui Wang, Andrea Scharnhorst, and Gwenn Englebienne. 2015. “Ariadne’s Thread — Interactive Navigation in a World of Networked Information”. In CHI '15 Extended Abstracts on Human Factors in Computing Systems. ACM, Seoul, South Korea. Paper, Poster
• Koopman, Rob, Shenghui Wang and Andrea Scharnhorst. 2015. “Contextualization of topics - browsing through terms, authors, journals and cluster allocations”. In Proceedings of 15th International Conference on Scientometrics & Informetrics. Istanbul, Turkey. Paper
Explore. Share. Magnify.
Thank youShenghui WangRob KoopmanOCLC Research EMEA