Date post: | 12-Jan-2016 |
Category: |
Documents |
Upload: | antony-parks |
View: | 228 times |
Download: | 3 times |
1
LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora
Chien-Chung HuangShui-Lung Chuang
Lee-Feng Chien
Presented by: Vu LONG
2
Outline
1. Introduction
2. LiveClassifier
3. Evaluation
4. Contribution
5. Future work
3
Introduction
http://140.109.19.252:8080/charles/index.jsp
• Uses Web search-result pages as the corpus
source• Exploits the structure information in the topic
hierarchy to train the classifier• Creates key terms to amend the insufficiency
of the topic hierarchy
4
LiveClassifier (Demo version)
Classify documents
Computer Science Classifier is chosen
• There are three created classifiers (topics): Computer Science, Europe, Scientists based on Yahoo! directory
5
LiveClassifier
Classify documents
Pseudo class
6
LiveClassifier• Users can self create their classifiers
7
LiveClassifier
• Feature Extractor
- Interacts with Search Engine and extracts highly-ranked search snippets as effective feature source
- Outputs feature vectors to describe both topic classes and text objects
8
LiveClassifier• Hier-Concept-Query-Formulation
- Formulate query through the topic hierarchy
9
LiveClassifier• Text Classifier
10
Evaluation• Overall performance evaluation
11
Evaluation• Granularity & Diversity
- Classifying text objects into different levels of the topic hierarchy got roughly the same results.
12
Evaluation• Thematic Metadata for Textual Data
13
Evaluation
• Paper Title Classification
- Collect data from 4 CS conferences in 2002
- Classify them into 36 second-level CS classes
14
Contribution
• Finds the ways to collect and organize corpora effectively
• Creates key terms to amend the insufficiency of the topic hierarchy
• Classifies text objects automatically without a pre-labeled training set
• Cooperates with Web information services and other systems easily
• Helps to create more refined data (thematic metadata) for textual data
15
Future work
• Optimize the classifier based by focusing on the training stage rather than only on organizing corpora
• Improve responding time
• Find appropriate pseudo classes