CERTH/CEA LIST at MediaEval Placing Task 2015

transcript

CERTH/CEA LIST at MediaEval Placing Task 2015Giorgos Kordopatis-Zilos1, Adrian Popescu2, Symeon Papadopoulos1 and Yiannis Kompatsiaris1

1 Information Technologies Institute (ITI), CERTH, Greece2 CEA LIST, 91190 Gif-sur-Yvette, France

MediaEval 2015 Workshop, Sept. 14-15, 2015, Wurzen, Germany

Summary

Tag-based location estimation (2 runs)• Based on a geographic Language Model• Built upon the scheme of our 2014 participation [2] (Kordopatis-Zilos et

al., MediaEval 2014)• Extensions from [3]: improved feature selection and weighting

(Kordopatis-Zilos et al., PAISI 2015)

Visual-based location estimation (1 run)• Geospatial clustering scheme of the most visually similar images

Hybrid location estimation (2 run)• Combination of the textual and visual approaches

Training sets• Training set released by the organisers (≈4.7M geotagged items)• YFCC dataset, excl. images from users in test set (≈40M geotagged items)

Tag-based location estimation

• Processing steps of the approach– Offline: language model construction– Online: location estimation

Language Model (LM)

• LM generation scheme– divide earth surface in rectangular cells with a side length of 0.01– calculate tag-cell probabilities based on the users that used the tag inside the cell

• LM-based estimation– the probability of each cell is calculated from the summation of the respective

tag-cell probabilities– Most Likely Cell (MLC) considered the cell with the highest probability and used

to produce the estimation

Inspired from [4]: (Popescu, MediaEval 2013)

Feature Selection and Weighting

Feature Selection• The final tag set is the intersection of the two tag sets

Feature Weighting• Locality weight function, sort tags in based on their locality score

• Normalize the weights from the Spatial Entropy (SE) function

• Combine the two weighting functions

accuracy locality

Accuracy• Partition training set into p folds (p = 10)

• Keep one partition at a time, and build LM with the rest p − 1

• Estimate the location of every item of the withheld partition

• Accuracy score of every tag

: correctly geotagged items: total items tagged with • Tags with non-zero accuracy score form the tag

From [3]: Kordopatis-Zilos et al., PAISI 2015

Estimated Locations

Locality

• Captures the spatial awareness of tags• When a user uses a tag, he/she is assigned to the respective location cell• Each cell has a set of users assigned to it• All users assigned to the same cell are considered neighbours• Locality score of every tag

: total occurrences of : set of all cells: set of users that used tag inside cell

• Tags with non-zero locality score form the tag set

Locality – value distribution

london (6975), paris (5452), nyc (3917)

luminancehdr (0.0035), dsc6362 (0.003), air photo (0.002)

Extensions

• Spatial Entropy (SE) function– calculate entropy values applying the Shannon entropy formula in the tag-cell

probabilities– build a Gaussian weight function based on the values of the tag SE

• Internal Grid– Built an additional LM using a finer grid, cell side length of 0.001°– combine the MLC of the individual language models

• Similarity search [6] (Van Laere et al., ICMR 2011)– determine most similar training images in the MLC– their center-of-gravity is the final location estimation

From [2]: (Kordopatis-Zilos et al., MediaEval 2014)

Visual-based location estimation

Model building• CNN features adapted by fine-tuning the VGG model [5] (Simonyan & Zisserman,

ICLR 2015)• Training: ~1K Points Of Interest (POIs), ~1200 images/POI• Caffe [1] (Jia et al., arxiv 2014) is fed directly with the CNN features• Compressed outputs of fc7 layer (4096d) to 128d using PCA• CNN features used to compute image similarities

Location Estimation• Geospatial clustering of visually most similar images• If -th image is within 1km from the closest one of the previous j − 1 images, it is

assigned to its cluster, otherwise it forms its own cluster• The largest cluster (or the first in case of equal size) is selected and its centroid is

used as the location estimate

Hybrid-based location estimationModel building• Combination of the textual and visual approaches• Build LM model using the tag-based approach above and use it for MLC selection

Similarity Calculation• Combination of the visual and textual similarities.• Normalize the visual similarities to the range [0, 1]• Similarity between two images

• The final estimation is the center-of-gravity of the most similar images

Low Confidence Estimations• For those test images, with no estimate or confidence lower than 0.02 (≈10% of

the test set), the visual approach is used to produce the estimated locations

Confidence

• Evaluate the confidence of the LM estimation of each query image• Measures how localized are the language model cell estimations, based on

cell probabilities• Confidence measure

: cell probability of cell for image : distance between and mlc: Most Likely Cell

Runs and Results

measure RUN-1 RUN-2 RUN-3 RUN-4 RUN-5

acc(1m) 0.15 0.01 0.15 0.16 0.16

acc(10m) 0.61 0.08 0.62 0.75 0.76

acc(100m) 6.40 1.76 6.52 7.73 7.83

acc(1km) 24.33 5.19 24.61 27.30 27.54

acc(10km) 43.07 7.43 43.41 46.48 46.77

m. error (km) 69 5663 61 24 22

RUN-1: Tag-based location estimation + released training setRUN-2: Visual-based location estimation + released training setRUN-3: Hybrid location estimation + released training setRUN-4: Tag-based location estimation + YFCC datasetRUN-5: Hybrid location estimation + YFCC dataset

Thank you!• Code:

https://github.com/MKLab-ITI/multimedia-geotagging

• Get in touch:@sympapadopoulos / papadop@iti.gr@georgekordopatis / georgekordopatis@iti.gr

References

[1] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T.

Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint

arXiv:1408.5093, 2014.

[2] G. Kordopatis-Zilos, G. Orfanidis, S. Papadopoulos, and Y. Kompatsiaris. Socialsensor at

mediaeval placing task 2014. In MediaEval 2014 Placing Task, 2014.

[3] G. Kordopatis-Zilos, S. Papadopoulos, and Y. Kompatsiaris. Geotagging social media content

with a refined language modelling approach. In Intelligence and Security Informatics, pages

21–40, 2015.

[4] A. Popescu. CEA LIST's participation at mediaeval 2013 placing task. In MediaEval 2013

Placing Task, 2013.

[5] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image

recognition. In International Conference on Learning Representations, 2015.

[6] O. Van Laere, S. Schockaert, and B. Dhoedt. Finding locations of Flickr resources using

language models and similarity search. ICMR ’11, pages 48:1–48:8, New York, NY, USA,

2011. ACM.

CERTH/CEA LIST at MediaEval Placing Task 2015

Technology