Associating Relevant Photos to Georeferenced
Textual Documents through Rank Aggregation
Rui Candeias and Bruno MartinsInstituto Superior Técnico : INESC-ID
ICSW WorkshopTerra Cognita 2011
Bonn, Germany
Introduction The illustration problem, a.k.a. Cross Media Retrieval:
Given a textual document (e.g., a travelogue) as the query Find relevant images (e.g., photos from Flickr) to illustrate the text
Many practical applications Illustrating travelogues with landmark photos
Very challenging problem Semantic gap between photos and textual documents Vocabulary mismatch between document terms and photo tags
Proposal : Geographically-aware cross media retrieval!
Related Work
Geographic Information RetrievalResolving place references in textual documentsRetrieving documents through geospatial similarity
Multimedia and Cross Media RetrievalExplore descriptions (e.g., tags) associated to photos
Associating Photos to Travelogues [Lu et al., 2010]Probabilistic topic models to avoid vocabulary mismatch
The Proposed Method1. Resolve place references in the document2. Collect nearby georeferenced photos from Flickr3. Select the best photos to associate to the document
I. Compute multiple relevance estimatorsII. Combine the estimators through rank aggregationIII. Select the top-ranked photo(s)
Resolving Places and Collecting Photos
Yahoo! Placemaker
Delimiting placenames and associating them to the corresponding geospatial
coordinates
Flickr’s API
Retriving metadata (e.g., tags) for photos taken close to a given pair of
geospatial coordinates
Resolving Places and Collecting PhotosThe Bonn Minster is one of Germany's oldest churches having been built between the 11th and 13th centuries.
Since the 13th century, when the people of Bonn included the Minster in their city ’s coat of arms, it has been the emblem of the City of Bonn.
The basilica of Bonn as we know it today was built on the site of the graves of the two martyrs Cassius and Florentius, the city’s patrons. The whole of its development is recorded, from its beginnings as a small place of worship in the late Roman period to its becoming the first large church complex in the Rhineland, and later a significant example of medieval Rhenish church architecture.
Resolving Places and Collecting PhotosThe Bonn Minster is one of Germany's oldest churches having been built between the 11th and 13th centuries.
Since the 13th century, when the people of Bonn included the Minster in their city ’s coat of arms, it has been the emblem of the City of Bonn.
The basilica of Bonn as we know it today was built on the site of the graves of the two martyrs Cassius and Florentius, the city’s patrons. The whole of its development is recorded, from its beginnings as a small place of worship in the late Roman period to its becoming the first large church complex in the Rhineland, and later a significant example of medieval Rhenish church architecture.
<extents> <center> <latitude>51.3346</latitude><longitude>1.31407</longitude> </center> <southWest> <latitude>38.7051</latitude><longitude>-91.5391</longitude> </southWest> <northEast> <latitude>55.0581</latitude><longitude>15.0421</longitude> </northEast></extents><placeDetails><placeId>2</placeId> <place> <woeId>640161</woeId> <type>Town</type> <name>Bonn, North Rhine-Westphalia, DE</name> <centroid> <latitude>50.7323</latitude> <longitude>7.10169</longitude> </centroid> </place></placeDetails>
The Bonn Minster is one of Germany's oldest churches having been built between the 11th and 13th centuries.
Since the 13th century, when the people of Bonn included the Minster in their city ’s coat of arms, it has been the emblem of the City of Bonn.
The basilica of Bonn as we know it today was built on the site of the graves of the two martyrs Cassius and Florentius, the city’s patrons. The whole of its development is recorded, from its beginnings as a small place of worship in the late Roman period to its becoming the first large church complex in the Rhineland, and later a significant example of medieval Rhenish church architecture.
Resolving Places and Collecting Photos<extents> <center> <latitude>51.3346</latitude><longitude>1.31407</longitude> </center> <southWest> <latitude>38.7051</latitude><longitude>-91.5391</longitude> </southWest> <northEast> <latitude>55.0581</latitude><longitude>15.0421</longitude> </northEast></extents><placeDetails><placeId>2</placeId> <place> <woeId>640161</woeId> <type>Town</type> <name>Bonn, North Rhine-Westphalia, DE</name> <centroid> <latitude>50.7323</latitude> <longitude>7.10169</longitude> </centroid> </place></placeDetails>
<photo id="3882008927" dateuploaded="1251930164" isfavorite="0“ views="483“> <owner nsid="26021670@N00" username="Claude@Munich" realname="Claudia" location="Munich, Germany“/> <title>Bonn Minster</title> <descriptio> The Bonn Minster is one of Germany's oldest churches having been built between the 11th and 13th centuries. … </description> <dates posted="1251930164" taken="2009-08-26 21:07:05” lastupdate="1306931029" /> <comments>6</comments> <tags> <tag raw="Germany" author="26021670@N00”>germany</tag> <tag raw="Nordrhein-Westfalen" author="26021670@N00”>nordrheinwestfalen</tag> <tag raw="church" author="26021670@N00”>church</tag> <tag raw="Bonn Minster" author="26021670@N00“>bonnminster</tag> <tag raw="Minster" author="26021670@N00“>minster</tag> </tags> <location latitude="50.733155" longitude="7.100451“ place_id="Uvujcu1XVrp6hVQ" woeid="640161"> <locality place_id="Uvujcu1XVrp6hVQ" woeid="640161">Bonn</locality> <county place_id="MlysCMBQUL9JMVQ_Og" woeid="12597065">Stadtkreis Bonn</county> <region place_id="RoiqFqRTUb6tlYAO" woeid="2345487">North Rhine-Westphalia</region> <country place_id="h7eZVDlTUb50Btij9Q" woeid="23424829">Germany</country> </location></photo>
The Relevance Estimators Textual Similarity
Photos having words (title, tags) that occur in the text of the document are more likely to be relevant
Geospatial Proximity Photos taken close to the places mentioned in the document are
more likely to be relevant
Temporal Cohesion Photos taken in the same semester as the temporal period
discussed in the document are more likely to be relevant
Photo Importance and Interestingness Photos having more visualizations or more comments should be
more interesting, and thus also more likely to be relevant
Computing The Estimators
Textual SimilarityTerm Frequency x Inverse Document Frequency (TF-IDF) Stopwords were first removedTags considered twice more important than title
Geospatial ProximityGreat circle distance Since documents have multiple locations, we used the
minimum and the average distances
Score-Based Rank Aggregation The relevance estimators are combined through score-
based rank aggregation schemes:1. Multiple scores are computed from the relevance estimators2. Scores for each relevance estimator are normalized through min-max procedure3. Final ranking is obtained by aggregating the normalized scores [Fox & Shaw, 1999?]
The CombSUM method Multiple scores are
summed
The CombMNZ method Multiple scores are summed
and multiplied by the number of non-zero scores
Ranker 1 Ranker 2
Candidate Score Candidate Score
A 0.0 A 0.8
B 0.6 B 0.1
C 0.4 C 0.1
Experimental Evaluation Dataset with 450 georeferenced Flickr photos
Total of 50 photos for each of 9 popular tourist destinations (i.e., capitals) Large textual descriptions, containing placenames, used as the queries Photos described with tags, title, geospatial coordinates, ...
Experiments with different combinations of the proposed method Text Similarity versus different combinations of relevance estimators CombSUM versus CombMNZ method
Results evaluated with Precision@1 and Reciprocal Rank Only one relevant photo per document
The Obtained Results
The Obtained Results
Conclusions We proposed and evaluated a novel geographically-aware
cross media retrieval method
The method leverages on resolved place references to avoid the semantic gap between photos and texts
Combining relevance estimators leads to more accurate cross-media retrieval results
The CombMNZ and CombSUM rank aggregation methods are adequate to the task, obtaining similar results
Future Work Supervised Learning to Rank (L2R)
Experiments with many different L2R algorithms Experiments made afterwards showed significantly better results
Many more relevance estimators Topical similarity with basis on Lattent Dirichlet Allocation (LDA) model Features from visual image clusters Experiments made afterwards showed slight increase in result quality
Outlier removal Remove outlier cities recognized by Yahoo! Placemaker
Improve the evaluation procedure Still to be made...