Date post: | 02-Nov-2014 |
Category: |
Technology |
Upload: | thomas-francart |
View: | 714 times |
Download: | 1 times |
08/04/2023
Mondeca’s approach to enriching search engines using business knowledge
Mondeca [email protected]
The intersection points of several domains
Knowledge-based enhanced
search
ContentContent semantic
annotation
Smart structure-based indexing of content
Search Knowledge
LERUDI use case
Content
Knowledge
CMS
WCM
AFSITM
SDK
Search
What knowledge are we talking about?
• Internal business/reference vocabularies :– Thesauri (multilingual)– Dictionaries– Named entities lists– Classification rules– Thesaurus alignments– …
• Structured data - Always• Linked Data :
– E.g.: GEMET thesaurus, subset of DBPedia named entities, etc.
At which level do we bring value?
• at 2 different levels:– when indexing content
• via index enrichment– when users perform search
• by assisting them in the query (re)formulation
• The preferred /most useful technique is to enrich content during the indexing phase– but this implies that content be reindexed every time
business knowldege evolves or changes
The search engine we used to demonstrate this
• Lucene SolR :– Open-source– Has advanced plain text search capabilities– Allows faceted search– Offers a highlight feature– Has spellchecker capabilities– Includes a « More Like This » (find related content)
feature– Is UIMA compliant– … full feature list available at :
http://lucene.apache.org/solr/features.html
• Principles discussed in the next slides may be applied to other search engines
SolR explorer : a test interface
•SolR returns an XML feed to an http request
–http://localhost:8080/solr/select/q=lac&start=0&length=10
•SolR explorer :–A web interface to visualize / navigate / test the retunred XML feed–Definitely not meant for end users!–https://issues.apache.org/jira/browse/SOLR-1163
The data set
• Structured catalogue of an e-tourism portal– Hotels– Restaurants– Activities– Contacts– Etc.
• Each resource is linked to a web site
Starting point: simple web indexing– without enrichment
Plan
1 - Synonyms2 - Translations3 – Specific terms4 – Spelling mistakes5 - Facets6 – Dynamic classification7 – Vocabulary alignments8 - Disambiguation
1 : enrichment using synonyms
• Why?– Increase recall, expand a request using similar terms
• How?– By providing a list of equivalent terms to the search
engine– SolR configuration:<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <!-- in this example, we will only use synonyms at index time --> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> <!-- ... --> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory" /> <!-- <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> --> <!-- ... --> </analyzer></fieldType>
Thomas Francart - Enrichissement des moteurs de recherche à partir de connaissances métier
Format of the synonyms file
• One line for each equivalent synonym• Option 1 : use all equivalent terms
– If one term is found, all equivalent terms are added to the index
• Option 2 : use controlled term only– If one of the terms is found, only the controlled term is added to the
index
déreglementation,libéralisation,dérégulationcroisière,croisière de plaisance,croisière maritimespectacle,attraction,showvacances familiales,tourisme familialpêche,pêche à la ligne,permis de pêche,article de pêche,pêche au gros,pêche touristiqueoffice de tourisme,otsi,office municipal de tourisme,syndicat d'initiative
libéralisation,dérégulation => déréglementationcroisière de plaisance,croisière maritime => croisièreattraction,show => spectacletourisme familial => vacances familialespêche à la ligne,permis de pêche,article de pêche,pêche au gros,pêche touristique => pêcheotsi,office municipal de tourisme,syndicat d'initiative => office de tourisme
Generation of a synonyms file
• Generation of « synonyms.txt » from a SKOS file– E.g.: using the World Tourism Organisation thesaurus
<skos:Concept rdf:about="http://thes.world-tourism.org#VACANCES"> <skos:altLabel xml:lang="en">long stays</skos:altLabel> <skos:altLabel xml:lang="fr">marché des vacances</skos:altLabel> <skos:altLabel xml:lang="fr">genre de vacances</skos:altLabel> <skos:altLabel xml:lang="fr">long séjour</skos:altLabel> <skos:altLabel xml:lang="en">holiday markets</skos:altLabel> <skos:altLabel xml:lang="en">vacations</skos:altLabel> <skos:altLabel xml:lang="es">mercado de vacaciones</skos:altLabel> <skos:altLabel xml:lang="fr">activité de vacances</skos:altLabel> <skos:altLabel xml:lang="fr">type de vacances</skos:altLabel> <skos:altLabel xml:lang="en">holiday tourism</skos:altLabel> <skos:altLabel xml:lang="fr">congés payés</skos:altLabel> <skos:altLabel xml:lang="es">estancia larga</skos:altLabel> <skos:altLabel xml:lang="fr">06.09</skos:altLabel> <skos:broader rdf:resource="http://thes.world-tourism.org#FLUX_TOURISTIQUE" /> <skos:inScheme rdf:resource="http://thes.world-tourism.org#_06_FLUX_TOURISTIQUE" /> <!-- … --> <skos:narrower rdf:resource="http://thes.world-tourism.org#VACANCES_D'HIVER" /> <skos:narrower rdf:resource="http://thes.world-tourism.org#VACANCES_D'ETE" /> <skos:prefLabel xml:lang="en">HOLIDAYS</skos:prefLabel> <skos:prefLabel xml:lang="fr">VACANCES</skos:prefLabel> <skos:prefLabel xml:lang="es">VACACIONES</skos:prefLabel> </skos:Concept>
…
Activités nautiques
HOLIDAYS,VACANCES,long stays,marché des vacances,genre de vacances,long séjour,holiday markets,vacations,activité de vacances,type de vacances,holiday tourism,congés payés,06.09KOREA DPR,COREE RDP,20.03.05.03
TOURISM IN NATIONAL ECONOMIES,TOURISME DANS L'ECONOMIE NATIONALE,04.04.04,place du tourisme dans l'économie
…
Result
Handle synonyms at index-time or query-time ?
• In most cases, it is recommended to handle synonyms at index-time– A synonym composed of several words (e.g.:« nautical
activities ») is tokenised at query and will not be correctly identified• Even when using quotes?
– It impacts the search engine’s scoring algorithms (IDF)– prefix queries (« naut* ») or fuzzy queries
(« ~activities ») are not analysed at the moment of the query and will not be extended to synonyms
• But :– The index will get all the more bigger– If synonyms change, reindexing must be done
To expand, or not to expand queries…?
• One possible solution to avoid inflating the index:– Avoid expanding from a list of synonyms…
– …but rather restrict expansion to one controlled value…
– … which could be the URI of a concept
• Advantages:– Index size does not inflate– No impact on scoring algorithms
• But it requires analysis when indexing and querying• Does not solve issue of synonyms composed of several words
spectacle,attraction,showpêche,pêche à la ligne,permis de pêche,article de pêche,pêche au gros,pêche touristiqueoffice de tourisme,otsi,office municipal de tourisme,syndicat d'initiative
attraction,show => spectaclepêche à la ligne,permis de pêche,article de pêche,pêche au gros,pêche touristique => pêcheotsi,office municipal de tourisme,syndicat d'initiative => office de tourisme
attraction,show,spectacle => http://thes.world-tourism.org#SPECTACLEpêche, pêche à la ligne,permis de pêche,article de pêche,pêche au gros,pêche touristique => http://thes.world-tourism.org#PECHEoffice de tourisme, otsi,office municipal de tourisme,syndicat d'initiative => http://thes.world-tourism.org#OFFICE_DE_TOURISME
Mixed approaches
• Use two synonym lists:– One tailored for indexing– Another one tailored for search expansion at query-time
• When new synonyms are needed:– Add them to the synonym list tailored for search
• They can be leveraged in real time, no need for reindexing• Does not solve the question of synonyms composed of several
words– Add them to the synonym list tailored for indexing too
• They will be leveraged at the next indexing phase • At the next indexing phase:
– Empty the synonyms list tailored for search• Another mixed approach:
– Process all the synonyms of a given single word when searching– Process all the synonyms composed of several words at indexing
phase
Plan
1 - Synonyms2 - Translations3 – Specific terms4 – Spelling mistakes5 - Facets6 – Dynamic classification7 – Vocabulary alignments8 - Disambiguation
2 : enrich using translations• Why?
– Add multilingual capabilities to the search engine / allow searching for content in a different language than the one used in the query
• Same methodology as for synonyms– Translations are declared as equivelent synonyms
• Example– Using the GEMET thesaurus (sustainable developpment)– Can be download in SKOS at http://www.eionet.europa.eu/gemet
…achat,purchase,compramosaïque,mosaic,mosaicostation de montagne,mountain resort,centro turístico de montaña…
<rdf:Description rdf:about="concept/10910"> <skos:prefLabel xml:lang="fr">station de montagne</skos:prefLabel> <skos:prefLabel xml:lang="en">mountain resort</skos:prefLabel> <skos:prefLabel xml:lang="es">centro turistico de montana</skos:prefLabel> </rdf:Description>
<fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" /> <filter class="solr.SynonymFilterFactory" synonyms="gemet.txt" ignoreCase="true" expand="true" /> </analyzer> <!-- … --></fieldType>
Resultat
!?
• Why would a search using « mosaic » match « poterie » and « vitrail »?
• In GEMET, the only information available is:
• BUT, in WTO, we also find the following information:
• As we are using GEMET and WTO dictionaries of synonyms, the result when indexing is:– « Poterie » « mosaïque » « mosaic »
• We are exploiting both WTO synonyms and translations from GEMET– B eware of any unwanted interactions!
…achat,purchase,compramosaïque,mosaic,mosaicostation de montagne,mountain resort,centro turístico de montaña…
ARTISANAT,vitrail,orfèvrerie,mécanique,dentelle,plomberie,tapisserie,ébénisterie,mosaïque,modélisme,tissage,porcelaine,crafts,artisanat d'art,menuiserie,cristallerie,joaillerie,émaux,peinture sur soie,poterie
Plan
1 - Synonyms2 - Translations3 – Specific terms4 – Spelling mistakes5 - Facets6 – Dynamic classification7 – Vocabulary alignments8 - Disambiguation
3 : enrichment using specific terms
• Why?– To increase recall. Allows searching on generic notions
• The GEMET and WTO thesauri rely on a hierarchy of terms– Loisirs > loisirs de plein air > randonnée > randonnées
cycliste– Loisirs > sorties > spectacle > cirque
• A search on « sorties » should find documents containing « spectacle » or « cirque »– A search on « Loisirs » (leisure) should find documents
containing « randonnée » (trek) or « spectacle » (show)– Etc.
• How?– Same methodology as
for the synonyms– Translation of specific
terms is performed when indexing, so as to translate a specific term into all of its corresponding generic terms• If done at search,
we would translate from generic to specific
• If « peinture » (paint) is in the text, then we must add « loisirs culturels » et « loisirs » which are the generic terms of that specific one
Generation of the specific terms file
<skos:Concept rdf:about="http://thes.world-tourism.org#LOISIRS"> <skos:narrower rdf:resource="http://thes.world-tourism.org#LOISIRS_DE_PLEIN_AIR" /> <skos:narrower rdf:resource="http://thes.world-tourism.org#SORTIE" /> <skos:narrower rdf:resource="http://thes.world-tourism.org#LOISIRS_D'INTERIEUR" /> <skos:narrower rdf:resource="http://thes.world-tourism.org#OISIVETE" /> <skos:narrower rdf:resource="http://thes.world-tourism.org#LOISIRS_CULTURELS" /> <skos:narrower rdf:resource="http://thes.world-tourism.org#JEU" /> <skos:narrower rdf:resource="http://thes.world-tourism.org#ARTISANAT" /> <skos:prefLabel xml:lang="fr">LOISIRS</skos:prefLabel> </skos:Concept><skos:Concept rdf:about="http://thes.world-tourism.org#LOISIRS_CULTURELS"> <skos:altLabel xml:lang="fr">loisirs artistiques</skos:altLabel> <skos:broader rdf:resource="http://thes.world-tourism.org#LOISIRS" /> <skos:narrower rdf:resource="http://thes.world-tourism.org#PEINTURE" /> <skos:prefLabel xml:lang="fr">LOISIRS CULTURELS</skos:prefLabel> </skos:Concept><skos:Concept rdf:about="http://thes.world-tourism.org#PEINTURE"> <skos:altLabel xml:lang="fr">09.03.07</skos:altLabel> <skos:broader rdf:resource="http://thes.world-tourism.org#LOISIRS_CULTURELS" /> <skos:prefLabel xml:lang="fr">PEINTURE</skos:prefLabel> </skos:Concept>
RESEAU => TRAFIC,TRANSPORTPEINTURE => LOISIRS CULTURELS,LOISIRSFETE => MANIFESTATION CULTURELLE,MANIFESTATION TOURISTIQUETRANSPORT FLUVIAL => MODE DE TRANSPORT,TRANSPORT
Result
Plan
1 - Synonyms2 - Translations3 – Specific terms4 – Spelling mistakes5 - Facets6 – Dynamic classification7 – Vocabulary alignments8 - Disambiguation
4 : spell checking
• Why?– Provide users with similar terms for a worng entry
• e.g.: « retsaurant » « Did you mean ‘restaurant’ ? »• How? Here are the two ways to to build smart
spellchecking :– By using the index as a dictionary
• Spelling corrections are in fact existing entries in the index– Hence almost a 100% chances to find resutls, except when
spellchecked terms are combined with other terms from the query
• But not all of the controlled/business terms are necessarily available for spell checking – If they do not exist in the indexed content
– By using a list of controlled terms• The suggested spelling corrections will not necessarily
trigger results– There is not garanty that any of the indexed document contains
the proposed terms• But all business terms are available for controlled searches
Spellchecking using an authority list
• Configuration SolR : solrconfig.xml<config><searchComponent name="spellcheck" class="solr.SpellCheckComponent"> <str name="queryAnalyzerFieldType">textSpell</str> <lst name="spellchecker"> <str name="name">default</str> <str name="field">name</str> <str name="spellcheckIndexDir">./spellchecker</str> </lst> <lst name="spellchecker"> <str name="classname">solr.FileBasedSpellChecker</str> <str name="name">file</str> <str name="sourceLocation">spellcheck.txt</str> <str name="characterEncoding">UTF-8</str> <str name="accuracy">0.8</str> <str name="spellcheckIndexDir">./spellcheckerFile</str> </lst></searchComponent>
<requestHandler name="standard" class="solr.SearchHandler" default="true"> <lst name="defaults"> <str name="echoParams">explicit</str> <str name="spellcheck.onlyMorePopular">false</str> <str name="spellcheck.extendedResults">false</str> <str name="spellcheck.count">1</str> </lst> <arr name="last-components"> <str>spellcheck</str> </arr></requestHandler></config>
Generation of spellcheck.txt
• Generation of spellcheck.txt from the WTO SKOS
<skos:Concept rdf:about="http://thes.world-tourism.org#VACANCES"> <skos:altLabel xml:lang="en">long stays</skos:altLabel> <skos:altLabel xml:lang="fr">marché des vacances</skos:altLabel> <skos:altLabel xml:lang="fr">genre de vacances</skos:altLabel> <skos:altLabel xml:lang="fr">long séjour</skos:altLabel> <skos:altLabel xml:lang="en">holiday markets</skos:altLabel> <skos:altLabel xml:lang="en">vacations</skos:altLabel> <skos:altLabel xml:lang="es">mercado de vacaciones</skos:altLabel> <skos:altLabel xml:lang="fr">activité de vacances</skos:altLabel> <skos:altLabel xml:lang="fr">type de vacances</skos:altLabel> <skos:altLabel xml:lang="en">holiday tourism</skos:altLabel> <skos:altLabel xml:lang="fr">congés payés</skos:altLabel> <skos:altLabel xml:lang="es">estancia larga</skos:altLabel> <skos:altLabel xml:lang="fr">06.09</skos:altLabel> <skos:broader rdf:resource="http://thes.world-tourism.org#FLUX_TOURISTIQUE" /> <skos:inScheme rdf:resource="http://thes.world-tourism.org#_06_FLUX_TOURISTIQUE" /> <!-- … --> <skos:narrower rdf:resource="http://thes.world-tourism.org#VACANCES_D'HIVER" /> <skos:narrower rdf:resource="http://thes.world-tourism.org#VACANCES_D'ETE" /> <skos:prefLabel xml:lang="en">HOLIDAYS</skos:prefLabel> <skos:prefLabel xml:lang="fr">VACANCES</skos:prefLabel> <skos:prefLabel xml:lang="es">VACACIONES</skos:prefLabel> </skos:Concept>
ORGANISMO DE CREDITO14.11.02Activités nautiquesHOLIDAYSVACANCESVACACIONESmarché des vacancesgenre de vacanceslong séjouractivité de vacancestype de vacancescongés payés06.09KOREA DPR…
Result
Plan
1 - Synonyms2 - Translations3 – Specific terms4 – Spelling mistakes5 - Facets6 – Dynamic classification7 – Vocabulary alignments8 - Disambiguation
5 : content semantic structuring
• « Smarter content = smarter index »• It takes content semantic structuring to enhance
the search experience– Associate meaningful metadata to content– Meaningful metadata bring unambiguous values from reference
vocabularies (identification using URIs)• Associating structured metadata to content
enables faceted navigation• This is a wide-ranging process which we will not
describe in details in this presentation– E.g.: use of Text-Mining and/or integration middleware sur as
Mondeca’s CA Mananger• SolR supports UIMA integration in its indexing chain to add
text mining tools– E.g.: manual tagging in the case of tourism catalogues
Strucutured catalogue in RDF
Index schema configuration
• 1 index field for each metadata– In conf/schema.xml
<field name="Mot_Cle_103696" multiValued="true" type="string" indexed="true" stored="true" /> <field name="animaux_acceptes" multiValued="false" type="string" indexed="true" stored="true" /> <field name="bassin_touristique_at" multiValued="true" type="string" indexed="true" stored="true" /> <field name="bordereau_Tourinfrance_103952" multiValued="true" type="string" indexed="true" stored="true" /> <field name="commune_at" multiValued="true" type="string" indexed="true" stored="true" /> <field name="zone_geographique_at" multiValued="true" type="string" indexed="true" stored="true" /> <field name="paiement_accepte" multiValued="true" type="string" indexed="true" stored="true" /> <field name="label_at" multiValued="true" type="string" indexed="true" stored="true" /> <field name="langue_parlee" multiValued="true" type="string" indexed="true" stored="true" /> <field name="type_h" multiValued="true" type="string" indexed="true" stored="true" /> <field name="classement" multiValued="true" type="string" indexed="true" stored="true" /> <field name="tarif_nuit_mini" multiValued="true" type="string" indexed="true" stored="true" />
Result: facets
Plan
1 - Synonyms2 - Translations3 – Specific terms4 – Spelling mistakes5 - Facets6 – Dynamic classification7 – Vocabulary alignments8 - Disambiguation
Thomas Francart - Enrichissement des moteurs de recherche à partir de connaissances métier
7 : dynamic conetntn classification
• Why?– The classification plan
used in the catalogue is not meant to be understood by end users• « objective » vs.
« subjective » vision of the content
– There is a need to adapt the classification plan :• To different types if
audiences• For diffent channels
– The same catalogue needs to be presented according to different perspectives
– To increase content repurposing
• Looking for place to stay?• Simple?• Classic?• Elegant?
ITM-rules : création des règles
Rules definitions: format
• New hierarchical classifications are in SKOS• A SPARQL classification rule (generated from ITM
Rule Editor) is associated to each entry in the SKOS file<skos:Concept rdf:about="itm:n#_migration_taxo_106544"> <skos:prefLabel xml:lang="fr">Raffiné</skos:prefLabel> <skos:definition>
PREFIX r: <itm:n#>PREFIX q: http://www.nievre-tourisme.com/onto#CONSTRUCT { ?SEARCHED_TOPIC <http://purl.org/dc/terms/subject>
r:_migration_taxo_106544 .}WHERE { ?SEARCHED_TOPIC a q:Hebergement . ?SEARCHED_TOPIC q:classement
q:class_CAT4 . } </skos:definition> <skos:definition>
PREFIX j: <itm:n#>PREFIX i: http://www.nievre-tourisme.com/onto#CONSTRUCT { ?SEARCHED_TOPIC <http://purl.org/dc/terms/subject>
j:_migration_taxo_106544 .}WHERE { ?SEARCHED_TOPIC a i:Hebergement . ?SEARCHED_TOPIC i:classement
i:class_4EP . } </skos:definition> </skos:Concept>
Content Classifier : rules execution
Taxonomy (Classification Rules)
SKOS + SPARQL
Classification engine
RD
F C
onte
nt M
etad
ata
…
Cla
ssifi
catio
n M
etad
ata
…
• Based on RDF triplestore• Loads terminology and metadata• Infer on terminology
• OWL & SKOS inference• Custom rules
• Apply SPARQL classification rules
• optionnaly, simplifies RDF structure
?x is a <Hotel> and price(?x) < 50
?x is a <Camping> and size(?x) > 300
…
TerminologySKOS + RDF
Content classified with additionnal
dcterms:subject and dc:subject properties
Catalogue classified with additional metadata
Additional index fields for the new classifications
<field name="" multiValued="true" type="string" indexed="true" stored="true" /> taxo_confort <field name="taxo_generale" multiValued="true" type="string" indexed="true" stored="true" />
• In conf/schema.xml
Dynmaic Classification: Result
<field name="" multiValued="true" type="string" indexed="true" stored="true" /> taxo_confort <field name="taxo_generale" multiValued="true" type="string" indexed="true" stored="true" />
Plan
1 - Synonyms2 - Translations3 – Specific terms4 – Spelling mistakes5 - Facets6 – Dynamic classification7 – Vocabulary alignments8 - Disambiguation
8 : using reference vocabulary alignments
WTO
• Why?– What if content A is annotated using thesaurus A, and users want to
search content using thesaurus B ?– Allows queries on a corpus annotated with a thesaurus different from the
one used to control queries
GEMET
Thesaurus alignment
ITM-align : creation ofn alignments
Alignment fiormats
<map> <Cell rdf:about="150046"> <entity1> <edoal:Class rdf:about="http://eurlex-directory-codes.europa.eu/0350" /> </entity1> <entity2> <edoal:Class rdf:about="http://eurovoc.europa.eu/2897" /> </entity2> <relation>fr.inrialpes.exmo.align.impl.rel.EquivRelation</relation> <measure rdf:datatype="http://www.w3.org/2001/XMLSchema#float">1.0</measure> </Cell></map><map> <Cell rdf:about="152849"> <entity1> <edoal:Class rdf:about="http://eurlex-directory-codes.europa.eu/0350" /> </entity1> <entity2> <edoal:Class rdf:about="http://eurovoc.europa.eu/2479" /> </entity2> <relation>fr.inrialpes.exmo.align.impl.rel.EquivRelation</relation> <measure rdf:datatype="http://www.w3.org/2001/XMLSchema#float">1.0</measure> </Cell></map>
Aligned concepts
Relation type Score
« EDOAL » format from INRIA: http://alignapi.gforge.inria.fr/edoal.html
Using alignments
• When indexing• The original document annotations are translated
using the alignement– from Thesaurus A to thesaurus B
• The index is enriched with concepts from thesaurus B– The index now contains annotations based on thesaurus A and
thesaurus B
• One can then search the corpus using concepts from thesaurus B
• The alignment is interpreted by specific code in the indexing chain, there is no specific configuration in SolR– except to specify a dedicated field which will be used for the
result of the alignment translation
Reference vocabulary alignments: result
Keywords from the source thesaurus (eurovoc)
Keywords from concepts
translated using alignments (from
eurovoc to eurlex)
Plan
1 - Synonyms2 - Translations3 – Specific terms4 – Spelling mistakes5 - Facets6 – Dynamic classification7 – Vocabulary alignments8 - Disambiguation
Thomas Francart - Enrichissement des moteurs de recherche à partir de connaissances métier
Disambiguation
• Why?– Match a user’s searched term to a controlled entity
• « loisirs » http://thes.world-tourism.org#LOISIRS• Disambiguate entities when searching only
makes sense if the same entities have been disambiguated when indexing– Either the document was explicitly categorized using a
controlled entity (its URI)– Or the entity was extracted using text mining tools
• disambiguation of an entity from a controlled vocabulary by the search engine is possible only if the controlled vocabulary has itself been indexed by the search engine
Disambiguation: principle
1. Use reference vocabulary when
indexing
2. Indexing of reference
vocabulary
http://www.z.fr/e1 doc1
http://www.z.fr/e1 doc2
venus http://www.z.fr/e1
cupidon http://www.z.fr/e2
3. Keyword disambiguation using
a controlled entity
4. Search on controlled entity id
Disambiguation: result