Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | brenda-tate |
View: | 217 times |
Download: | 0 times |
Applying Text Classification in Conference Management: Some Lessons LearnedAndreas Pesenhofer, Helmut Berger, Michael Dittenbach, Andreas Rauber
Overview
Conference Management Systems Classification & Clustering Case Studies
ECDL 2005 ECR
Conclusions
Conference Management Systems Set of tools to support conference workflow Basic support for paper submission &
review collection Many tasks for further automation
Selection of the program committee Topic assignment of submission Paper to reviewer assignment Support in review generation Poster arrangement Post-conference access to papers
Classification & Clustering Topic assignment of submission
Problem: authors uncertain about precise topic assignment (conference terminology)
Solution: support by automatic assignment Method: ATC based on abstracts
Poster arrangement & Post-conference access to papers Problem: topic based arrangement Solution: clustering Method: SOM & Mnemonic SOM
ATC for topic assignment
Train model based on previous conferences Abstract submission Automatic assignment Confirmation
Clustering for organization
Arrange posters thematically Non-rectangular SOMs reflecting
conference site Mnemonic SOMs simplify post-conference
paper access
Overview
Conference Management Systems Classification & Clustering Case Studies
ECDL 2005 ECR
Conclusions
ECDL 2005 – ATC data
English abstracts of previous ECDL conferences
Topics of the conference call -> defined seven categories
Pre-processing (removing all numbers, punctuation marks, special characters, transformation to lower case)
tfidf-weighting 4,141 unique terms IG of 3,460 top ranked terms average -
accuracy over all category is 58.60%
ECDL – training dataclass-
IDclass description sum
1Concepts of Digital Libraries, Concepts of Documents and Metadata
34
2System Architectures, Open Archives, Collection Building, Integration and Interoperability
40
3Information Retrieval, Information Organization, Search and Usage
67
4User Studies, System Evaluation, Personalization, User Interfaces and User Centered Design
50
5 Digital Preservation, Web Archiving and Long Term Access 12
6 Digital Library Applications and Case Studies 65
7Multimedia, Mixed Media, Audio, Video, 3D and non-traditional Objects
43
sum over the selected abstracts 311
ECDL 2005 – classification results
class-ID 1 2 3 4 5 6 7total
recall F1
1 1 1 2 2 . 1 1 8 0.130.1
7
2 1 17 1 . . . . 19 0.890.7
7
3 1 3 26 6 . 2 . 38 0.680.6
9
4 . . 4 21 . 2 1 28 0.750.7
1
5 1 1 3 . . 1 1 7 0.000.0
0
6 . 3 1 2 . 12 1 19 0.630.6
5
7 . . . . . . 3 3 1.000.6
0
precision 0.25 0.68 0.70 0.68 0.00 0.67
0.43
ECDL 2005 – SOM data
Poster and Paper Organization: full text of accepted posters of ECDL 2005 term selection based on minimal word
length and document frequencies 30 posters - 569 terms
Post-conference access 71 papers and posters – 5,654 terms
Overview
Conference Management Systems Classification & Clustering Case Studies
ECDL 2005 ECR
Conclusions
ECR - Data
Abstracts of the ECR:European Congress for Radiology
Training set: ECR 2003 & 2004 - 1,952 documents
Test set: ECR 2005 - 924 documents Same steps as for the ECDL data Resulting in 14,887 unique terms IG: 5,720 top ranked terms, average
accuracy over all categories of 73.57%
ECR – training dataclass-ID class description
2003
2004
sum
1Abdominal and Gastrointestinal 160 119 279
2 Breast 80 59 139
3 Cardiac 70 70 140
4 Chest 60 70 130
5 Computer Applications 30 30 60
6 Contrast Media 40 39 79
7 Genitourinary 70 60 130
8 Head and Neck 40 40 80
9 Interventional Radiology 130 117 247
10 Musculoskeletal 90 80 170
11 Neuro 90 99 189
12 Pediatric 30 40 70
13 Physics in Radiology 40 40 80
14 Radiographers 10 10 20
15 Vascular 69 70 139
sum over the selected abstracts 1009 943195
2
ECR 2005 – classification results
class-ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15total
recall F1
1 111 1 . 1 2 2 2 . 2 1 2 . 1 . 1 126 0.880.79
2 1 61 . . . . . . 1 . . . 6 . . 69 0.880.87
3 1 . 73 . . . . . . 1 . . 3 . 2 80 0.910.86
4 6 . 5 49 1 . . . 3 . 1 . . . 5 70 0.700.77
5 2 2 . 3 10 . . . . . 3 . 7 . 3 30 0.330.43
6 12 . . 1 . 26 2 . 1 2 1 . 1 . 3 49 0.530.61
7 5 . . . . 1 38 . 5 3 2 . 3 . 1 58 0.660.73
8 4 . . 1 . 2 4 8 2 2 4 . 2 . 1 30 0.270.39
9 2 4 2 . 1 3 . . 99 2 2 . . . 5 120 0.830.81
10 2 2 . 1 1 1 . . 2 60 5 1 2 . 1 78 0.770.78
11 1 . 1 1 . . . 1 4 . 64 2 1 . 4 79 0.810.73
12 4 . 1 . . . . 1 1 1 10 11 . . 1 30 0.370.50
13 . 1 3 . . . . . . 1 2 . 39 . 2 48 0.810.68
14 2 . . . 1 . . . . 3 . . . 2 . 8 0.250.40
15 2 . 4 . . 1 . 1 3 . . . 1 . 37 49 0.760.64
precision
0.72
0.86
0.82
0.86
0.63
0.72
0.83
0.73
0.80
0.79
0.67
0.79
0.59
1.00
0.56
Conclusions
Quality is proportional to amount of training documents
Structure of the classes (overlapping?)
The bulk of submissions can be dealt with automatically
May be used for session assignment Arrange poster & papers thematically Easy to memorize & find
Questions?
E-Commerce Competence Center
Donau-City-Strasse 1
1220 Vienna Austria
Phone: +43/1/522 71 71-20
Fax: +43/1/522 71 71-71
Internet: http://www.ec3.at/
E-Mail: [email protected]