Intelligent Database Systems Lab
Presenter : Chang,Chun-Chih
Authors : David Milne * , Ian H. Witten
2012, AI
An open-source toolkit for mining Wikipedia
Intelligent Database Systems Lab
OutlinesMotivationObjectivesMethodologyExperimentsConclusionsComments
Intelligent Database Systems Lab
Motivation The online encyclopedia Wikipedia is a vast,
constantly evolving tapestry of interlinked articles.
For developers and researchers it represents a giant multilingual database of concepts and semantic relations, a potential resource for natural language processing
Intelligent Database Systems Lab
Objectives
• The Wikipedia Miner toolkit, an open-source software system that allows researchers and developers to integrate Wikipedia’s rich semantics into their own applications.
• Wikipedia Miner is intended to be a platform for sharing data mining techniques.
Intelligent Database Systems Lab
Methodology - Architecture of the wikipedia Miner toolkit
Intelligent Database Systems Lab
Methodology - Measuring relatedness between concepts
Intelligent Database Systems Lab
Methodology - Measuring relatedness between concepts
Intelligent Database Systems Lab
Methodology -Features for measuring artucle relatedness
Intelligent Database Systems Lab
Experiments - Impact of thresholds for disambiguation and detection
Intelligent Database Systems Lab
Experiments - Impact of relatedness dependencies
Intelligent Database Systems Lab
Experiments - Impact of traning data
Intelligent Database Systems Lab
Experiments - performance of the disambiguator
Intelligent Database Systems Lab
Experiments - performance of the detector
Intelligent Database Systems Lab
Conclusions
• Our aim in releasing this work open source is not to provide a complete and polished product,
• but rather a resource for the research community to collaborate around and continue building together.
Intelligent Database Systems Lab
Comments
• Advantages• Applications - wikipedia - Disambiguation - Annotation