+ All Categories
Home > Documents > Hierarchical Classification: Comparison with Flat Method

Hierarchical Classification: Comparison with Flat Method

Date post: 04-Jan-2016
Category:
Upload: daria-allison
View: 60 times
Download: 2 times
Share this document with a friend
Description:
Hierarchical Classification: Comparison with Flat Method. Yongwook Yoon Jun 12, 2003 NLP Lab., POSTECH. Contents. Hierarchical vs. Flat Classifier System Overview Experiment & Results Comparison with Flat Methods Future Work. Hierarchical Classification. - PowerPoint PPT Presentation
25
Hierarchical Classification: Comparison with Flat Method Yongwook Yoon Jun 12, 2003 NLP Lab., POSTECH
Transcript
Page 1: Hierarchical Classification:  Comparison with Flat Method

Hierarchical Classification:

Comparison with Flat Method

Yongwook YoonJun 12, 2003

NLP Lab., POSTECH

Page 2: Hierarchical Classification:  Comparison with Flat Method

2/24

Contents

Hierarchical vs. Flat Classifier System Overview Experiment & Results Comparison with Flat Methods Future Work

Page 3: Hierarchical Classification:  Comparison with Flat Method

3/24

Hierarchical Classification

Natural Method in very large group of classes Top-down Classification Human readability promotion

The better performance than flat method Much better recall and precision Flexible strategy: applicable to different levels

of hierarchy

Page 4: Hierarchical Classification:  Comparison with Flat Method

4/24

Flat vs. Hierarchical classification

Root

D1 D2 D3 Dn

Business

Grain Oil

D1 D2 Di Dj Dj+1 Dn

Page 5: Hierarchical Classification:  Comparison with Flat Method

5/24

System Overview (1)

Baseline system currently BOW (Bag Of Words) approach

No syntactic or semantic feature yet Naïve Bayesian classifier Many feature selection methods

Pruning vocabulary by document count or word count

Information gain Hierarchical classification added

Page 6: Hierarchical Classification:  Comparison with Flat Method

6/24

System Overview (2)

Many Extensions possible BOW Library (by Mccallum) supplies

Support Vector Machine Not on-line but batch

Maximum Entropy K-nearest neighbor Pr-TFIDF

Bow does not supplies On-line learning methods Syntactic or semantic classification features

Page 7: Hierarchical Classification:  Comparison with Flat Method

7/24

System Overview (3)

Functions implemented for hierarchical classification Construct the hierarchy prototype as a

directory structure Division of documents into different levels

of hierarchy Training of each classifiers in hierarchy Testing of documents from the root to the

leaf classifiers automatically Logging and evaluation

Page 8: Hierarchical Classification:  Comparison with Flat Method

8/24

Construct Hierarchical Structure

Documents

Make upLogical hierarchy

Divide docsInto hierarchy

Training of Each classifier Training

parameters

Business

Grain Oil

Page 9: Hierarchical Classification:  Comparison with Flat Method

9/24

Classifying documents

From Root nodeInput a document

From intermediateResult, Go further

Finalclassification

Level_0

Grain OilSteel

Class_1 Class_i

Class_j

Class_k Class_N

A doc

Page 10: Hierarchical Classification:  Comparison with Flat Method

10/24

Experiment

ck ck

Data: 20 newsgroup documents 20 classes 1,000 documents per class Intrinsic hierarchy

rec.sport.hockey, talk.politics.mideast Two major trials

Flat vs. Hierarchy Evaluation and comparison

Page 11: Hierarchical Classification:  Comparison with Flat Method

11/24

Experiment detail

Sigir-2001(Bekkerman et. al) 과 동일한 환경에서 실험 News 의 header 부분 제외

단 subject 라인은 살림 모든 문자를 소문자화 Multi-labeled class 허용 4-fold cross-validation

Page 12: Hierarchical Classification:  Comparison with Flat Method

12/24

4-fold cross validation

Flat and hierarchy case 에 동일 적용 20,000 의 문서를 random 하게

5,000 개의 문서로 된 4 set 로 구분 위 4 set 중 3 개씩을 training 으로 1 개는 testing 으로 조합 4 번의 실험을 수행

최종 evaluation 은 위 4 개를 평균 Sigir-2001 evaluation 방법

Page 13: Hierarchical Classification:  Comparison with Flat Method

13/24

root

alt comp misc rec sci soc talk

graphicsos sys windows

forsale

autosmotor-cycles

sport

religion

cryptelectronics med

christian

space

ibm mac

hardwarehardware

baseball hocky politics religion

guns mideast misc misc

atheism

xms-windows

misc

총 8 개의 classifier 가 필요

Page 14: Hierarchical Classification:  Comparison with Flat Method

14/24

Result – Flat methodThe result of 4-fold cross-validation: ./test_out/cv3_info900.stats

Correct: 16573 out of 19996 ( 82.88% percent accuracy)

classname 0 1 2 3 4 5 6 7 8 18 19 :total 0 alt.atheism 825 . . . 1 . . 1 23 11 82 :1000 82.50% 1 comp.graphics . 866 33 17 14 42 4 2 1 . 1 :1000 86.60% 2 comp.os.ms-windows.misc . 25 873 34 2 65 1 . . . . :1000 87.30% 3 comp.sys.ibm.pc.hardware 1 60 97 662 126 12 8 1 . . . :1000 66.20% 4 comp.sys.mac.hardware . 33 13 25 902 5 4 1 . . . :1000 90.20% 5 comp.windows.x . 122 65 8 1 783 1 2 1 . 1 :1000 78.30% 6 misc.forsale . 8 5 25 17 6 801 31 61 4 1 :1000 80.10% 7 rec.autos . 1 . . 2 2 11 846 98 2 . :1000 84.60% 8 rec.motorcycles . 2 . . . . 2 17 971 . . :1000 97.10% 9 rec.sport.baseball . . . . . 1 1 11 55 1 . :1000 86.60%10 rec.sport.hockey 2 . . . . 1 2 4 11 1 . :1000 96.70%11 sci.crypt . 8 6 . . 8 . . 2 1 . : 999 93.99%12 sci.electronics . 30 6 13 31 3 10 35 12 . . :1000 74.30%13 sci.med . 16 . . 2 4 1 5 8 3 . :1000 93.00%14 sci.space 2 13 3 . 3 1 . 3 2 3 1 :1000 95.70%15 soc.religion.christian 4 . . . . . . . . 1 2 : 997 99.10%16 talk.politics.guns 1 . 1 . . . 2 3 12 68 22 :1000 87.00%17 talk.politics.mideast 6 2 . . . 7 3 . 10 149 2 :1000 78.50%18 talk.politics.misc 5 1 . . 1 . . 6 6 673 38 :1000 67.30%19 talk.religion.misc 330 3 . . . . . 4 2 107 326 :1000 32.60%

Page 15: Hierarchical Classification:  Comparison with Flat Method

15/24

Evaluation measure

From the four entries αi, βi, γi of the confusion matrix we compute Precision =

Recall =

Micro-averaged BEP = (P+R) / 2 Overall recall and precision are the same

i i ii

i i

i i ii

i i

βi: 원래 Ci 가 아니나 Ci 로분류된 문서 수 ,

γi: 원래 Ci 이나 Cj 로 분류된문서 수

Page 16: Hierarchical Classification:  Comparison with Flat Method

16/24

Result – Hierarchical (1)

# level_0The result of 4-fold cross-validation: ./test_out/cv3_info20000.statsCorrect: 18440 out of 19996 ( 92.22% percent accuracy)

classname 0 1 2 3 4 5 6 :total 0 alt.atheism 962 1 . 3 12 4 18 :1000 96.20% 1 comp. 1 4807 32 13 147 . . :5000 96.14% 2 misc.forsale . 117 771 62 49 . 1 :1000 77.10% 3 rec. 1 18 14 3916 43 . 8 :4000 97.90% 4 sci. 6 187 23 39 3730 . 14 :3999 93.27% 5 soc.religion.christian 4 1 . . . 978 14 : 997 98.09% 6 talk. 409 17 2 60 160 76 3276 :4000 81.90%

# level_1/comp.The result of 4-fold cross-validation: ./test_out/cv3_info900.statsCorrect: 4276 out of 4807 ( 88.95% percent accuracy)

classname 0 1 2 3 :total 0 comp.graphics 802 33 35 67 : 937 85.59% 1 comp.os.ms-windows.misc 13 848 55 75 : 991 85.57% 2 comp.sys. 42 78 1743 44 :1907 91.40% 3 comp.windows.x 47 31 11 883 : 972 90.84%

Page 17: Hierarchical Classification:  Comparison with Flat Method

17/24

# level_1/sci.The result of 4-fold cross-validation: ./test_out/cv3_info900.stats

Correct: 3596 out of 3730 ( 96.41% percent accuracy)

classname 0 1 2 3 :total 0 sci.crypt 969 5 1 2 : 977 99.18% 1 sci.electronics 20 731 6 43 : 800 91.38% 2 sci.med 8 14 931 14 : 967 96.28% 3 sci.space 6 10 5 965 : 986 97.87%

# level_1/talk.The result of 4-fold cross-validation: ./test_out/cv3_info900.stats

Correct: 3014 out of 3276 ( 92.00% percent accuracy)

classname 0 1 :total 0 talk.politics. 2680 113 :2793 95.95% 1 talk.religion.misc 149 334 : 483 69.15%

# level_2/comp.sys.The result of 4-fold cross-validation: ./test_out/cv3_info10.stats

Correct: 1709 out of 1743 ( 98.05% percent accuracy)

classname 0 1 :total 0 comp.sys.ibm.pc.hardware 815 22 : 837 97.37% 1 comp.sys.mac.hardware 12 894 : 906 98.68%

Result – Hierarchical (1)

Page 18: Hierarchical Classification:  Comparison with Flat Method

18/24

Evaluation measure inHierarchical

Recall Same as the flat method case The correct classification count in leaf node

Precision Should consider the incorrect count in the upper

level classification Divide the incorrect count by the count of classes in

the lower level Finally, in the computation at the leaf node we

consider all of the averaged incorrect count of the upper levels into account

Page 19: Hierarchical Classification:  Comparison with Flat Method

19/24

Evaluation measure inHierarchical

Recall Same as the flat method case The correct classification count in leaf node

Precision Should consider the incorrect count in the upper

level classification Divide the incorrect count by the count of classes in

the lower level Finally, in the computation at the leaf node we

consider all of the averaged incorrect count of the upper levels into account

Page 20: Hierarchical Classification:  Comparison with Flat Method

20/24

Evaluation measure inHierarchical – Cont’

# level_0The result of 4-fold cross-validation: ./test_out/cv3_info20000.statsCorrect: 18440 out of 19996 ( 92.22% percent accuracy)

classname 0 1 2 3 4 5 6 :total 0 alt.atheism 962 1 . 3 12 4 18 :1000 96.20% 1 comp. 1 4807 32 13 147 . . :5000 96.14% 2 misc.forsale . 117 771 62 49 . 1 :1000 77.10% 3 rec. 1 18 14 3916 43 . 8 :4000 97.90% 4 sci. 6 187 23 39 3730 . 14 :3999 93.27% 5 soc.religion.christian 4 1 . . . 978 14 : 997 98.09% 6 talk. 409 17 2 60 160 76 3276 :4000 81.90%

# level_1/comp.The result of 4-fold cross-validation: ./test_out/cv3_info900.statsCorrect: 4276 out of 4807 ( 88.95% percent accuracy)

classname 0 1 2 3 :total 0 comp.graphics 802 33 35 67 : 937 85.59% 1 comp.os.ms-windows.misc 13 848 55 75 : 991 85.57% 2 comp.sys. 42 78 1743 44 :1907 91.40% 3 comp.windows.x 47 31 11 883 : 972 90.84%

(1+117+18+187+1+17) / 5 = 68.2

Precision of comp.graphics= 802 / (802+13+42+47+ 68.2 ) = 82.5

Page 21: Hierarchical Classification:  Comparison with Flat Method

21/24

Comparison of Flat vs. HierarchicalAlgorithm BEP condition

Flat 82.9Naïve Bayesian,

infogain – 900 words

Hierarchical 85.8Naïve Bayesian,

Infogain – 10 ~ 20,000 words

Sigir-2001(Flat)

86.3(unfair)

SVM,MI – 15,000 words

88.6 SVM,

IB clustering – 300 groups

Page 22: Hierarchical Classification:  Comparison with Flat Method

22/24

Analysis Hierarchical 이 Flat case 보다 월등한 성능향상 Naïve Bayesian 과 word feature 만의 결과

SVM, Maximum entropy 등 다른 방법 적용할 필요 Hierarchical 에서는 level 마다 다른 classifier 와

다른 feature selection 방법 사용 가능 보다 flexible 한 적용 가능 성능향상을 위한 tuning 의 여지가 많음 실험에서는 level 마다 feature 갯수를 달리 적용

Level_0: 20,000; Level_1:900; Level_2: 10~900 words

Class 추가 , 삭제에 대하여 국부적인 영향 Relevant level 에 대해서만 재 학습

Page 23: Hierarchical Classification:  Comparison with Flat Method

23/24

Future Work

Naïve Bayesian 을 대체한 다른 classifier 적용 Support Vector Machine, Maximum Entropy,

K-nearest neighbor, Pr-TFIDF 등 Hierarchy + Adaptive learning

Hierarchy 에 적합한 adaptive 방법 발굴 다른 feature selection 방법 적용

Page 24: Hierarchical Classification:  Comparison with Flat Method

24/24

Discussion Overall precision 과 recall 이 일치하는 게 우연인가 필연인가 ?

Micro-averaged 라서 그런가 아니면 , class 당 문서수가 일정한 결과인가 ? Hierarchical 의 classification 단계에서 어느 한 단계의 분류결과에

대해서 정답만 추출하여 다음단계로 내려보내지 말고 다 내려보내야 맞음 ( 현실적 , 합리적인 적용 ) 현실에선 중간단계마다 정답을 구분하지 않음

또 , Hierarchical precision 계산시 상위레벨에서 틀린것을 평균해서 하위로 내려보낼 필요 없이 , 위에서 처럼 틀린것까지 다 하위레벨로 내려보낸후 leaf node 에서 정답과 비교 , precison 을 계산하는 것이 맞지 않나 ? 지금의 계산방법과 비교 필요

Future Work 에 대하여 Baseline 이후 연구방향 확정 필요 Class 의 추가 , 삭제를 incremental learning 으로 cover 하는 방법

Hard Problem ! 단순 Hierarchical + adaptive learning

Page 25: Hierarchical Classification:  Comparison with Flat Method

The End


Recommended