+ All Categories
Home > Documents > Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus...

Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus...

Date post: 24-May-2018
Category:
Upload: buiminh
View: 214 times
Download: 1 times
Share this document with a friend
19
2016-04-12 | Seminar | Extreme Classification | 1 Seminar aus Data Mining und Maschinellem Lernen Extreme Classification
Transcript
Page 1: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 1

Seminar aus Data Mining und Maschinellem Lernen

Extreme Classification

Page 2: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 2

Seminar aus Data Mining und Maschinellem Lernen Time and place - Wednesdays, 17:10 - 18:50, Room E202 First presentation: probably in three weeks Each Wednesday two talks

Page 3: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 3

Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute talk on the

material they are assigned, followed by 15 minutes of questions.

The talk and the slides are allowed to be both English or German, but we strongly encourage the students to give the talk in English.

It is expected of the students to participate in the discussions. Important! The content of the talk should exceed the scope of

the paper, and demonstrate that a thorough understanding of the material was achieved.

Follow the guidelines on the Seminar site and at https://www.ke.tu-darmstadt.de/lehre/arbeiten/giving-a-talk-at-a-ke-seminar-1

Page 4: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 4

Extreme Classification

Hot topic in the last one or two years Roughly: all types of classification problems where the target

space, i.e. categories/classes/labels is large in practice often multilabel classification problems the assignment of several classes instead of only one class basic tutorial at https://www.ke.tu-darmstadt.de/staff/eneldo

Page 5: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 5

Image annotation

scene dataset consists of 2407 images assigned to 6 labels

{Fall foliage, Field} {Beach, Urban}

Matthew R. BOUTELL, Jiebo LUO, Xipeng SHEN, C. M. Christopher M. BROWN: LearningMulti-Label Scene Classification. In: Pattern Recognition, vol. 37 (9): pp. 1757–1771,2004.

Page 6: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 6

EUR-Lex repository

Page 7: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 7

EUR-Lex repository

19328 (freely accessible) documents of the Directory of Community legislation in force of the European Union documents available in several European languages

multiple classifications of the same documents most challenging one: EUROVOC descriptors associated to

each document 3965 descriptors, on average 5.37 labels per document descriptors are organized in a hierarchy with up to 7 levels

Page 8: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 8

EUR-Lex repository

Page 9: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 9

Formal definition

Given input: a set of training objects x

1, …, x

m , x

i vectors in Ra

a set of label mappings y1, …, y

m, each a subset of Y={λ1, … , λn}

Objective: find a function h: Ra → Y which maps x

i to y

i

as accurately as possible, as efficiently as possible

i x1

x2

x3

... xa

y

1 A 1 0 ... 0.1 {λ1,λn}

2 B 2 1 ... 0.3 {λ2}

3 C 3 0 ... 0.5 {}

4 D 4 1 ... 0.6 {λ1}

...

Page 10: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 10

Formal definition

Alternative view: a set of training objects x

1, …, x

m , x

i vectors in Ra

a number of n binary Target variables yi={0,1}

Objective: find a function h: Ra → Y = {0,1}n which maps x

i to a binary vector

as accurately as possible, as efficiently as possible

i x1

x2

x3 ... x

ay1

y2 ... y

n

1 A 1 0 ... 0.1 1 0 ... 1

2 B 2 1 ... 0.3 0 1 ... 0

3 C 3 0 ... 0.5 0 0 ... 0

4 D 4 1 ... 0.6 1 0 ... 0

...

i x1

x2

x3

... xa

y

1 A 1 0 ... 0.1 {λ1,λn}

2 B 2 1 ... 0.3 {λ2}

3 C 3 0 ... 0.5 {}

4 D 4 1 ... 0.6 {λ1}

...

Page 11: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 11

Extreme ClassificationTopicsApproaches can be roughly classified into Problem Transformation Approaches Binary Relevance & simplifications Hashing

Decision Trees Landmark-based label selection Label Space Transformations Neural Networks and Embeddings Topic Models and Generative Approaches

Page 12: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 12

Extreme ClassificationIntro and Basics

1 Arturo Montejo Ráez, Luís Alfonso Ureña López, Ralf Steinberger. Adaptive Selection of Base Classifiers in One-Against-All Learning for Large Multi-labeled Collections I. Katakis, G. Tsoumakas, and I. Vlahavas, Multilabel text

classification for automated tag suggestion

2 G. Tsoumakas, I. Katakis, and I. Vlahavas, Effective and efficient multilabel classification in domains with large number of labels S. Bengio, J. Weston, and D. Grangier. Label embedding trees for

large multi-class task

Page 13: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 13

Extreme ClassificationIntro and Basics

3 Qinfeng Shi, James Petterson, Gideon Dror, John Langford, Alex Smola, Alex Strehl, Vishy Vishwanathan. Hash Kernels. Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, 2009

4 Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford, Alex Smola. Feature Hashing for Large Scale Multitask Learning. ICML, 2009

Page 14: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 14

Extreme ClassificationIntro and Basics

5 D. Hsu, S. Kakade, J. Langford, and T. Zhang, Multi-Label Prediction via Compressed Sensing , in NIPS, 2009.

6 F. Tai, and H. Lin, Multi-label Classification with Principle Label Space Transformation , in Neural Computation, 2012.

Page 15: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 15

Extreme ClassificationBasics/Decision Trees

7 S. Ji, L. Tang, S. Yu, and J. Ye, Extracting Shared Subspaces for Multi-label Classification , in KDD, 2008.

8 C Vens, J Struyf, L Schietgat, S Džeroski, H Blockeel. Decision trees for hierarchical multi-label classification, Machine Learning, 2008

9 R. Agrawal, A. Gupta , Y. Prabhu, and M. Varma, Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages, in WWW, 2013.

10 Y. Prabhu, and M. Varma, FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning, in KDD, 2014.

Page 16: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 16

Extreme ClassificationLabel Space Transformations

11 Wei Bi , James Tin-Yau Kwok : Multi-Label Classification on Tree- and DAG-Structured Hierarchies. In: Proceedings of the 28th International Conference on Machine Learning, 2011.

12 Y. Chen, and H. Lin, Feature-aware Label Space Dimension Reduction for Multi-label Classification , in NIPS, 2012.

13 H. Yu, P. Jain, P. Kar, and I. Dhillon, Large-scale Multi-label Learning with Missing Labels, in ICML, 2014.

14 Z. Lin, G. Ding, M. Hu, and J. Wang, Multi-label Classification via Feature-aware Implicit Label Space Encoding , in ICML, 2014.

15 M. Cisse, N. Usunier, T. Artieres, and P. Gallinari, Robust Bloom Filters for Large Multilabel Classification Tasks , in NIPS, 2013.

Page 17: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 17

Extreme ClassificationNeural Networks and Embeddings

16 J. Weston, S. Bengio, and N. Usunier, WSABIE: Scaling Up To Large Vocabulary Image Annotation , in IJCAI, 2011.

17 Jinseok Nam, Eneldo Loza Mencía and Johannes Fürnkranz, All-in Text: Learning Document, Label, and Word Representations Jointly, in: Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2016

18 K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain, Sparse Local Embeddings for Extreme Multi-label Classification, in NIPS, 2015.

19 P. Mineiro, and N. Karampatziakis, Fast Label Embeddings via Randomized Linear Algebra, Preprint, 2015.

20 N. Karampatziakis, and P. Mineiro, Scalable Multilabel Prediction via Randomized Methods, Preprint, 2015.

Page 18: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 18

Extreme ClassificationLandmark-based label selection

21 K. Balasubramanian, and G. Lebanon, The Landmark Selection Method for Multiple Output Prediction , ICML, 2012.

22 W. Bi, and J. Kwok, Efficient Multi-label Classification with Many Labels , in ICML, 2013.

Page 19: Seminar aus Data Mining und Maschinellem Lernen | Seminar | Extreme Classification | 3 Seminar aus Data Mining und Maschinellem Lernen The students are expected to give a 30 minute

2016-04-12 | Seminar | Extreme Classification | 19

Extreme ClassificationBR/ Topic Models

23 B. Hariharan, S. Vishwanathan, and M. Varma, Efficient max-margin multi-label classification with applications to zero-shot learning, in Machine Learning Journal, 2012.

24 Timothy N. Rubin,·America Chambers, Padhraic Smyth, Mark Steyvers. Statistical topic models for multi-label document classification, Machine Learning, 2011

25 Piyush Rai, Changwei Hu, Ricardo Henao, Lawrence Carin Large-Scale Bayesian Multi-Label Learning via Topic-Based Label Embeddings , NIPS; 2015


Recommended