AN ADAPTIVE CLUSTERING APPROACH FOR THE DIVERSIFICATION OF IMAGE RETRIEVAL RESULTS
MAIA ZAHARIEVA VIENNA UNIVERSITY OF TECHNOLOGY & UNIVERSITY OF VIENNA, AUSTRIA
October 20-21, Hilversum, Netherlands
RETRIEVING DIVERSE SOCIAL IMAGES TASK
THE IDEA
QUERY: SAILING BOAT
QUERY: TREES REFLECTED IN WATER
Different queries Different features
Different clustering approaches
Potentially highly imbalanced groupings
Varying dimensionalities
THE WORKFLOW
APP
RO
AC
HES Affinity Propagation (AP)
Expectation Maximisation (EM) k-Means (KM) X-Means (XM)
CLU
STER
ING
/ D
IVER
SIFI
CA
TIO
NVIS
UA
L
CNN (ad, gen) DCT Intensity histogram (IH) KANSEI shape (KS) MPEG-7 color layout (CL) MPEG-7 color structure (CS) MPEG-7 edge histogram (EH) MPEG-7 homogeneous texture (HT) MPEG-7 region-based shape (RS) MPEG-7 scalable color (SC)
TEX
TU
AL
TF-IDF (title) TF-IDF (tags) TF-IDF (description) TF-IDF (title+tags) TF-IDF (title+tags+description)
FEA
TU
RE
EXT
RA
CT
ION
RANKED IMAGE SET
INT
ERN
AL
EVA
LUA
TIO
N
Compactness: - Sum of squares - C-Index Separability: - Single linkage Compactness + Separability - Calinski-Harabasz - Davies-Bouldin - Silhouette Consistency-based - Gamma - Tau
…
RO
UN
D R
OB
IN
RE-RANKED IMAGE LIST
EVALUATION RESULTS
Affinity Propagation
Expectation Maximization
k-Means X-Means ∑
CNN ad 11, 25 29 38, 51 5CNN gen 4, 59 54 3DCT 48 19, 67 50 4Intensity histogram 45 2 28, 53 6, 65 6KANSEI shape 9, 20, 26, 36, 46 57, 13 7MPEG7 CL 22, 47, 55 5, 61 12, 21 7MPEG7 CS 15 7, 62, 70 4MPEG7 EH MPEG7
32, 68 14, 40, 42, 52 6MPEG7 HT 44 8, 27, 63 24 5MPEG7 RS 23, 69 10, 41 35, 64, 66 7MPEG7 SC 3, 43, 56 30, 37 5TITLE 16 1TAGS 58, 60 2TITLE+TAGS 1, 49 2TITLE+TAGS+DESCR 17, 39 33 18, 31, 34 6
∑ 2 26 26 16 70
OPTIMAL SOLUTION
Approach P@20 CR@20 F1@20
Flickr Baseline 0.6979 0.3717 0.4674
Optimal Solutionvisual 0.8179 0.6575 0.7122text 0.8136 0.6453 0.7043
visual+text 0.8250 0.6634 0.7186
Best performing fixed settings
visual 0.6657 0.4453 0.5237text 0.6636 0.4274 0.5045
visual+text 0.6657 0.4453 0.5237
Adaptive approach
visual 0.6500 0.4398 0.5123text 0.6729 0.4230 0.5029
visual+text 0.6286 0.4061 0.4803
DEVELOPMENT DATA
BENCHMARK RESULTS 2016
run configuration P@20 CR@20 F1@20
run 1 adaptive, visual 0.5141 0.4024 0.4292
run 2 adaptive, text 0.5406 0.4130 0.4463
run 3 adaptive, visual+text 0.5430 0.4130 0.4471
run 4 fixed, visual 0.4969 0.3603 0.4006
BENCHMARK RESULTS 2016
LESSONS LEARNED
MEDIAEVAL BENCHMARK 2016: RETRIEVING DIVERSE SOCIAL IMAGES TASK
LESSONS LEARNED
▸ Different queries favour different feature representations
▸ Strong requirements for higher generalization applicability and flexibility of approaches for image search diversification
▸ Multiple solutions can be considered being correct
▸ How to make the evaluation more objective?