Post on 13-Jan-2017
transcript
CERTH/CEA LIST at MediaEval Placing Task 2015Giorgos Kordopatis-Zilos1, Adrian Popescu2, Symeon Papadopoulos1 and Yiannis Kompatsiaris1
1 Information Technologies Institute (ITI), CERTH, Greece2 CEA LIST, 91190 Gif-sur-Yvette, France
MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany
Summary
#2
Tag-based location estimation (2 runs)• Based on a geographic Language Model• Built upon the scheme of our 2014 participation [2] (Kordopatis-Zilos et
al., MediaEval 2014)• Extensions from [3]: improved feature selection and weighting
(Kordopatis-Zilos et al., PAISI 2015)
Visual-based location estimation (1 run)• Geospatial clustering scheme of the most visually similar images
Hybrid location estimation (2 run)• Combination of the textual and visual approaches
Training sets• Training set released by the organisers (≈4.7M geotagged items)• YFCC dataset, excl. images from users in test set (≈40M geotagged items)
Tag-based location estimation
#3
• Processing steps of the approach– Offline: language model construction– Online: location estimation
Language Model (LM)
• LM generation scheme– divide earth surface in rectangular cells with a side length of 0.01– calculate tag-cell probabilities based on the users that used the tag inside the cell
• LM-based estimation– the probability of each cell is calculated from the summation of the respective
tag-cell probabilities– Most Likely Cell (MLC) considered the cell with the highest probability and used
to produce the estimation
Inspired from [4]: (Popescu, MediaEval 2013)
#4
Feature Selection and Weighting
Feature Selection• The final tag set is the intersection of the two tag sets
Feature Weighting• Locality weight function, sort tags in based on their locality score
• Normalize the weights from the Spatial Entropy (SE) function
• Combine the two weighting functions
#5
accuracy locality
Accuracy• Partition training set into p folds (p = 10)
• Keep one partition at a time, and build LM with the rest p − 1
• Estimate the location of every item of the withheld partition
• Accuracy score of every tag
: correctly geotagged items: total items tagged with • Tags with non-zero accuracy score form the tag
set
From [3]: Kordopatis-Zilos et al., PAISI 2015
#6
Estimated Locations
Locality
#7
• Captures the spatial awareness of tags• When a user uses a tag, he/she is assigned to the respective location cell• Each cell has a set of users assigned to it• All users assigned to the same cell are considered neighbours• Locality score of every tag
: total occurrences of : set of all cells: set of users that used tag inside cell
• Tags with non-zero locality score form the tag set
Locality – value distribution
#8
london (6975), paris (5452), nyc (3917)
luminancehdr (0.0035), dsc6362 (0.003), air photo (0.002)
Extensions
• Spatial Entropy (SE) function– calculate entropy values applying the Shannon entropy formula in the tag-cell
probabilities– build a Gaussian weight function based on the values of the tag SE
#9
• Internal Grid– Built an additional LM using a finer grid, cell side length of 0.001°– combine the MLC of the individual language models
• Similarity search [6] (Van Laere et al., ICMR 2011)– determine most similar training images in the MLC– their center-of-gravity is the final location estimation
From [2]: (Kordopatis-Zilos et al., MediaEval 2014)
Visual-based location estimation
#10
Model building• CNN features adapted by fine-tuning the VGG model [5] (Simonyan & Zisserman,
ICLR 2015)• Training: ~1K Points Of Interest (POIs), ~1200 images/POI• Caffe [1] (Jia et al., arxiv 2014) is fed directly with the CNN features• Compressed outputs of fc7 layer (4096d) to 128d using PCA• CNN features used to compute image similarities
Location Estimation• Geospatial clustering of visually most similar images• If -th image is within 1km from the closest one of the previous j − 1 images, it is
assigned to its cluster, otherwise it forms its own cluster• The largest cluster (or the first in case of equal size) is selected and its centroid is
used as the location estimate
Hybrid-based location estimationModel building• Combination of the textual and visual approaches• Build LM model using the tag-based approach above and use it for MLC selection
Similarity Calculation• Combination of the visual and textual similarities.• Normalize the visual similarities to the range [0, 1]• Similarity between two images
• The final estimation is the center-of-gravity of the most similar images
Low Confidence Estimations• For those test images, with no estimate or confidence lower than 0.02 (≈10% of
the test set), the visual approach is used to produce the estimated locations
#11
Confidence
• Evaluate the confidence of the LM estimation of each query image• Measures how localized are the language model cell estimations, based on
cell probabilities• Confidence measure
: cell probability of cell for image : distance between and mlc: Most Likely Cell
#12
Runs and Results
#13
measure RUN-1 RUN-2 RUN-3 RUN-4 RUN-5
acc(1m) 0.15 0.01 0.15 0.16 0.16
acc(10m) 0.61 0.08 0.62 0.75 0.76
acc(100m) 6.40 1.76 6.52 7.73 7.83
acc(1km) 24.33 5.19 24.61 27.30 27.54
acc(10km) 43.07 7.43 43.41 46.48 46.77
m. error (km) 69 5663 61 24 22
RUN-1: Tag-based location estimation + released training setRUN-2: Visual-based location estimation + released training setRUN-3: Hybrid location estimation + released training setRUN-4: Tag-based location estimation + YFCC datasetRUN-5: Hybrid location estimation + YFCC dataset
Thank you!• Code:
https://github.com/MKLab-ITI/multimedia-geotagging
• Get in touch:@sympapadopoulos / papadop@iti.gr@georgekordopatis / georgekordopatis@iti.gr
#14
References
#15
[1] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T.
Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint
arXiv:1408.5093, 2014.
[2] G. Kordopatis-Zilos, G. Orfanidis, S. Papadopoulos, and Y. Kompatsiaris. Socialsensor at
mediaeval placing task 2014. In MediaEval 2014 Placing Task, 2014.
[3] G. Kordopatis-Zilos, S. Papadopoulos, and Y. Kompatsiaris. Geotagging social media content
with a refined language modelling approach. In Intelligence and Security Informatics, pages
21–40, 2015.
[4] A. Popescu. CEA LIST's participation at mediaeval 2013 placing task. In MediaEval 2013
Placing Task, 2013.
[5] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image
recognition. In International Conference on Learning Representations, 2015.
[6] O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of Flickr resources using
language models and similarity search. ICMR ’11, pages 48:1–48:8, New York, NY, USA,
2011. ACM.