Download - Multipedia: Enriching DBpedia with Multimedia information

Multipedia:Enriching DBpedia with

ImagesAndrés García-Silva†, Asunción Gómez-Pérez†

Max Jakob *, Pablo Mendez * and Chris Bizer �

† {hgarcia, ocorcho,asun}@fi.upm.esFacultad de Informática

Universidad Politécnica de MadridCampus de Montegancedo s/n

28660 Boadilla del Monte, Madrid, Spain

*[email protected] Systems Group

Freie Universitat Berlin, Germany

Garcia-Silva et al.

Multipedia Introduction

2

• Enriching ontologies with multimedia• The use of images and videos complement information

about concepts/entities in existing knowledge bases.

• Multimodal ontologies can help in QA systems, User Interfaces, search and recommendation processes.

Bone

Pathology

IsA

occurs

isA

depicts

depicts

«Show me X-ray Images with fractures of the Femur»

Radhouani, S., HweeLim, J.: pierre Chevallet, J., Falquet, G.: Combining textual and visual ontologies to solve medical multimodal queries. In: IEEE International Conference on Multimedia and Expo., pp. 1853-1856 (2006).

hgarcia

Cambiar la imagen por otra de internet

Garcia-Silva et al.

Multipedia

3

• Goal: Populate a general purpose ontology with images from the Web.

- Find relevant images for ontology instances with ambiguous names

• DBpedia knowledge base• Collects facts from Wikipedia containing 3.5 million entities, • Classified into a consistent cross-domain ontology: 272 classes and

1.6 million instances.• Has evolved into a hub in the linked data cloud.

• Images in DBpedia• Wikipedia images are represented in

DBpedia (foaf:depiction)• about 70% of the wikipedia articles don’t

have images

Introduction

hgarcia

1) validar el dato del 70%2) Validar el numero de classes en la DBpedia Ontology3) validar "has evolved into a " el into

http://richard.cyganiak.de/2007/10/lod/imagemap.html

Garcia-Silva et al.

Multipedia Introduction

• Challenges• Ambiguity of instance labels

4

Querying the web for images related to the resource dbpedia:hornet

5Garcia-Silva et al.

Multipedia Related Work

Approach Technique Contextual Information

Ontology

Taneva et al., 2010

Search Engine,Training data, and Visual similarity.

Wikipedia Infobox properties

YAGO Instances

Deng et al., 2009 (ImageNet)

Search engine, Visual Similarity, Amazon Mechanical Turk to assess quality

WordNet synonyms and words from parent synset

WordNet Noun Synsets

Popescu et al., 2007(RetrievOnto)

Search Engine, Content based Image Retrieval,

WordNet synonyms WordNet Synsets under Plancental

Russel et al., 2008(LabelMe)

Collaborative Manual Annotation of set of images

- WordNet Synsets

Flickr Wrapper Search Engine, Exact term match

Geographic info (latitude, longitude)

DBpedia Resources


Multipedia Enriching DBpedia with Multimedia

Get Context

Retrieve Images

Aggregate

Generate tag-based ranking Aggregate

Wikipedia-based Context Index

Image Search Engines

Related terms

Query per context term & dbpr name

Rankings of Images(One per each query)

List of ImagesAnnotated with tags

Ranking of ImagesRanking of Images

Ranking of Images

dbpr:Hornet



GetContext Retrieve Images Agregate

Generate tag-based image

rankingAgregate

Wikipedia-based Context Index

Get Contextfamily, wasps, insect

Wikipedia article

dbpr:Hornet



GetContextRetrieve Images Agregate


rankingAgregate

Retrieve Images

Image Search Engines

Q0=HornetQ1=Hornet and FamilyQ2=Hornet and WaspsQ3=Hornet and insect

family, wasps, insect

R0 = img0,1; img0,2 ... Img0,k

R1 = img1,1; img1,2 ... Img1,l

R2 = img2,1; img2,2 ... Img2,m

R3 = img3,1; img3,2 ... Img3,n

dbpr:Hornet

Image Rankings





rankingAgregate

R0 = img0,1; img0,2 ... Img0,k

R1 = img1,1; img1,2 ... Img1,l

R2 = img2,1; img2,2 ... Img2,m

R3 = img3,1; img3,2 ... Img3,n

Aggregate

Rcontext-based= img1; img2 ... Imgp

Borda´s count• Positional Method, very easy to compute• Each query result Ri is a voter and Images imgj are candidates:

For each candidate imgj in Ri Si(imgj) = number of candidates

ranked below imgj in Ri.

Output: imgj ordered by S(imgj) value

𝑆൫𝑖𝑚𝑔𝑗൯= 𝑆𝑖(𝑖𝑚𝑔𝑗)|𝐶|𝑖=0





rankingAgregate

List of images

L = R0 ᴜ R1 ᴜ R2 ᴜ R3

Generate tag-based ranking Rtag-based= img1; img2 ... Imgq

1) Measuring relatedness between a DBpedia resource and an image: - Overlapping of terms between the context of the former and the tags of the latter.

2) Vector Space Model to represent the DBpedia resource and images: - TF as weighting scheme, - cosine function to measure similarity

3) Generate ranking of images according to the similarity value

Rtag-based= img1; img2 ... Imgq

Rcontext-based= img1; img2 ... Imgp

Aggregate Rfinal= img1; img2 ... Imgl


Multipedia Experiments

• How many context words do produce the best results?

Apple context: «juice, fruit, apples, capital, michigan, orange»


Multipedia Experiments• Ambiguity

• Search engines work well:• unambiguous names• ambiguous names referring a dominant sense

e.g., dbpedia:Stonehenge

• However they fail for ambiguous names:

• Lacking of a dominant sensee.g.: dbpedia:Apple

• When they do not refer to the dominant sense

e.g.: dbpedia:Blackberry



• Dominance:

• Dataset:• 10 Classes and 15 dbpr randomly selected per each class• Each dbpr must be: 1) popular, 2) have a dominance under 0.7 • We found dbpr for Mammals, Birds and Insects• Increasing the dominance limit to 0.9 we found dbpr for the rest

of classes.



• 15 people evaluate the results of three approaches• Each image was rated by 3 evaluators




Multipedia Conclusions

• Multipedia an approach to automatically populate an ontology with images related to existing instances

• We focused on the particularly challenging problem of ambiguity in instance names

• Human-driven evaluation of the approach involving 15 users and a total of 2250 image ratings containing DBpedia resources from several classes.

• A variation of Multipedia improves average precision by 9.4% over a baseline of keyword queries to commercial image search engines

• We have validated that in contrast to the baseline our approach achieves the highest precision with ambiguous names lacking a dominant sense.