Date post: | 12-Apr-2017 |
Category: |
Science |
Upload: | multimediaeval |
View: | 102 times |
Download: | 1 times |
Placing Images with Refined Language Models and Similarity Search with PCA-reduced VGG FeaturesGiorgos Kordopatis-Zilos1, Adrian Popescu2, Symeon Papadopoulos1 and Yiannis Kompatsiaris1
1 Information Technologies Institute (ITI), CERTH, Greece
2 CEA LIST, 91190 Gif-sur-Yvette, France
MediaEval 2016 Workshop, Oct. 20-21, 2016, Hilversum, Netherlands.
Summary
#2
Tag-based location estimation (1 runs)• Built upon the scheme of our 2015 participation [1] (Kordopatis-Zilos et
al., MediaEval 2015)• Based on a refined probabilistic Language Model
Visual-based location estimation (1 run)• Extract PCA-reduced VGG features to compute image similarities• Geospatial clustering scheme of the most visually similar images
Hybrid location estimation (3 run)• Combination of the textual and visual approaches using a set of rules
Training sets• Training set released by the organisers (≈4.7M geotagged items)• YFCC dataset, excl. images from users in test set (≈40M geotagged items)• External data derived from gazetteers, i.e. Geonames and OpenStreetMap
Tag-based location estimation
#3
• Processing steps of the approach– Offline: language model construction– Online: location estimation
OpenStreetMap
Pre-processing
• Tags and titles of the training set items are processed
• Apply – URL decoding– lowercase transformation– tokenization
• Remove– accents– symbols– punctuations
• The multi-word tags are split into their individual terms, which are also included in the item's term set
• Discarded numerics or less than three characters terms
#4
Language Model (LM)
• LM-based estimation– Most Likely Cell (mlc) considered the cell with the highest probability and
used to produce the estimation
𝑚𝑙𝑐𝑗 = argmax𝑖
𝑘=1
𝑇𝑗
𝑝(𝑡𝑘|𝑐𝑖) ∗ 𝑤(𝑡𝑘)
Inspired from [4]: (Popescu, MediaEval 2013)
#5
• LM generation scheme– divide earth surface in rectangular
cells with a side length of 0.01°
– calculate term-cell probabilities𝑝(𝑡|𝑐) = 𝑁𝑢/𝑁𝑡
Feature Selection and Weighting
#6
Feature Weighting
• Locality weight function, a function based on term relative position in T
• Spatial Entropy weight function, a Gaussian function based on the term’s spatial entropy
• Linear combination of the two weights
Feature Selection
• Calculate terms locality using a grid of 0.01°×0.01°
• When a user uses a given term, he/she is assigned to the entire cell neighborhood instead of a unique cell as in [1]
𝑙 𝑡 = 𝑁𝑡 ∗σ𝑐∈𝐶σ𝑢∈𝑈𝑡,𝑐
|{𝑢′|𝑢′ ∈ 𝑈𝑡,𝑐 , 𝑢′ ≠ 𝑢}|
𝑁𝑡2
• Terms with non-zero locality score form the term set 𝑇
Refinements
#7
• Multiple Grids– Built an additional LM using a finer
grid (cell side length of 0.001°)– combine the MLC of the individual
language models
• Similarity search [5] (Van Laere et al., ICMR 2011)– determine 𝑘𝑡 most similar training images in the MLC– their center-of-gravity is the final location estimation
From [2]: (Kordopatis-Zilos et al., PAISI 2015)
Visual-based location estimation
#8
• Main Objectives
• Ensure that the visual features are generic and transferable• Provide a compact representation of the features
• Model building
• CNN features extracted by fine-tuning the VGG model [4]
• Training: ~5K Points Of Interest (POIs), over 7M Flickr images using queries with:
– the POI name and a radius of 5km around its coordinates– the POI name and the associated city name
• Compressed outputs of fc7 layer (4096d) to 128d using PCA, learned on a subset of 250,000 train images
• Similarity Search based on the PCA-reduced CNN features
Visual-based location estimation
#9
Location Estimation
• Geospatial clustering of 𝑘𝑣 = 20 visually most similar images
• The largest cluster (or the first in case of equal size) is selected and its centroid is used as the location estimate
Visual Confidence
• Confidence metric for the visual estimation is based on the size of the largest cluster
𝑐𝑜𝑛𝑓𝑣 𝑖 = max(𝑛 𝑖 − 𝑛𝑡𝑘𝑣 − 𝑛𝑡
, 0)
𝑛 𝑖 : number of neighbors in the largest cluster of image i𝑛𝑡: configuration parameter of the confidence score ‘’strictness’’
Hybrid-based location estimation
• A set of rules to determine the source of estimation between the text and visual approaches
• The visual estimation is chosen in cases:
→ No estimation could be produced by the text approach
→ Visual estimation fell inside the borders of the mlc
→ By comparing the confidence scores 𝑐𝑜𝑛𝑓𝑣 and 𝑐𝑜𝑛𝑓𝑡 [1]
• Otherwise the text estimation is selected
#10
Runs and Results
#11
RUN-1: Tag-based location estimation + released training set
RUN-2: Visual-based location estimation + released training set
RUN-3: Hybrid location estimation + released training set
RUN-4: Hybrid location estimation + YFCC dataset
RUN-5: Hybrid location estimation + YFCC + External data
RUN-E: Visual-based location estimation + entire YFCC dataset
Images
Runs and Results
#12
RUN-1: Tag-based location estimation + released training set
RUN-2: Visual-based location estimation + released training set
RUN-3: Hybrid location estimation + released training set
RUN-4: Hybrid location estimation + YFCC dataset
RUN-5: Hybrid location estimation + YFCC + External data
Videos
References
#13
[1] G. Kordopatis-Zilos, A. Popescu, S. Papadopoulos, and Y. Kompatsiaris. Socialsensor at mediaeval placing task 2015. In MediaEval 2015 Placing Task, 2015.
[2] G. Kordopatis-Zilos, S. Papadopoulos, and Y. Kompatsiaris. Geotagging social media content with a refined language modelling approach. In Intelligence and Security Informatics, pages 21–40, 2015.
[3] A. Popescu. CEA LIST's participation at mediaeval 2013 placing task. In MediaEval 2013 Placing Task, 2013.
[4] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
[5] O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of Flickr resources using language models and similarity search. ICMR ’11, pages 48:1–48:8, New York, NY, USA, 2011. ACM.
Thank you!
#14
Data/Code:
– https://github.com/MKLab-ITI/multimedia-geotagging/
Get in touch:
– Giorgos Kordopatis-Zilos: [email protected]
– Symeon Papadopoulos: [email protected] / @sympap
With the support of: