Date post: | 21-Dec-2015 |
Category: |
Documents |
View: | 219 times |
Download: | 0 times |
Text Categorization
Numerous Applications•Search Engines/Portals•Customer Service•….
Domains:•Topics•Genres•Languages
$$$ Making
How do people deal with a large number of classes?
Use fast multiclass algorithms (Naïve Bayes) Builds one model per class
Use Binary classification algorithms (SVMs) and break an n class problems into n binary problems
What happens with a 1000 class problem? Can we do better?
ECOC to the Rescue!
An n-class problem can be solved by solving log2n problems
More efficient than one-per-class Does it actually perform better?
What is ECOC?
Solve multiclass problems by decomposing them into multiple binary problems
Use a learner to learn the binary problems
Classification Performance
Efficiency
NB
ECOC
Preliminary Results
This Proposal
ECOC reduces the error of the Naïve Bayes Classifier by 66% with no increase in computational cost
Proposed Solutions
Design codewords that minimize cost and maximize “performance”
Investigate the assignment of codewords to classes
Learn the decoding function Incorporate unlabeled data into ECOC
Use unlabeled data
Current learning algorithms using unlabeled data (EM, Co-Training) don’t work well with a large number of categories
ECOC works great with a large number of classes but there is no framework for usaing unlabeled data
Use Unlabeled Data
ECOC decomposes multiclass problems into binary problems
Co-Training works great with binary problems
ECOC + Co-Train = Learn each binary problem in ECOC with Co-Training (and variants of Co-Training such as Co-EM)