NeOn: Life
Integrated
Priority: IS
This deliveontology wFinding a gpropose a topics are representssemanticaget a spec
The contexcore technclassificatioas topicscontextualimportance
Document IdClass DeliverProject start Project durat
ecycle Sup
d Project (IS
ST-2004-2.4
D3
Deliverab
Deliverab
Other Aut
erable provweb search good rankin
solution ththen visua
s a weightelly closer toifically tailo
xt-sensitive nologies foon of result. Since Sizes any tee of the con
entifier: Nrable: Ndate: Mtion: 4
pport for Ne
ST-2005-02
4.7 – “Sem
3.2.4 Con
ble Co-ordin
ble Co-ordin
thors: Dun
vides an escenarios w
ng is one of hat automatalized in a pd ranking.
o the topics red ranking
search of oor contextuts into a giv
SearchPointextual genencepts in the
EON/2009/D3EON EU-IST-arch 1, 2006 years
2
etworked O
27595)
mantic-base
ntext-sen
nator:
nating Inst
nja Mladen
extension owith the usethe main ptically genepanel whichEvery suchnear the se of the resu
ontologies aalization d
ven ontologyt easily u
eral knowlede ontology.
3.2.4/v1.0 -2005-027595
2006–2009 © C
Ontologies
ed knowled
nsitive Se
Boštjan Pa
titution:
nić (JSI), Ma
of the web e of other onroblems of
erates topich the user h ranking iselected poinults with a s
as proposedeveloped iy. Conceptspgrades adge by re-r
DateSubmVersStateDistr
opyright lies wit
dge and co
earch of
ajntar
J. Stefa
arko Grobe
applicationntologies thontology se
cs related tocan interac
s aligned wint get rankedingle click.
d in this delin NeOn Ws that are thany textualranking it in
e due: mission date:sion: e: ribution:
NeO
th the respective
ntent syste
Ontolog
an Institute
elnik (JSI)
n SearchPohat provide cearch engino a query act with. Eveith the seled higher. Ef
iverable proWP3. Topiche most repl search en accordan
AugustAugustV1.0 Final Public
On-pro
e authors and th
ems”
gies
e (JSI)
oint, whichcontext for
nes. In Searand its res
ery point onected point. ffectively, th
ovides one ocs are genpresented gengine, it ce with use
t 31, 2009 t 31, 2009
oject
heir institutions.
enhancesthe search.rchPoint weults. These
n this panelI.e. results
he user can
of the threenerated viaget selected
effectivelyer selected
t.org
.
s .
e e l
s n
e a d y d
Page 2 of 40
NeOn C
This docuCommissiopartners ar
Open KnowlBerrill Milton UnitedContaE-mail
UniveCamp28660Spain ContaE-mail
IntelligCalle d28006Spain ContaE-mail
Instituet en AZIRSTMontb38334FranceContaE-mail
UniveUniver56070GermaContaE-mail
OntopAmalie(Raum76227GermaContaE-mail
Atos OCalle d28037Spain ContaE-mail
onsortiu
ment is a on of the Ere involved
University (Oedge Media InBuilding, WalKeynes, MK
d Kingdom ct person: Mal address: {m.
ersidad Politéus de Monteg
0 Boadilla del M
ct person: Asul address: asu
gent Softwarde Pedro de V
6 Madrid
ct person: Jesl address: jcon
ut National deAutomatique
T – 655 avenubonnot Saint M4 Saint-Ismier e ct person: Jérl address: jero
ersität Koblenrsitätsstrasse
0 Koblenz any ct person: Stel address: staa
prise GmbH. (enbadstr. 36
mfabrik 29) 7 Karlsruhe any ct person: Jürl address: ang
Origin S.A. (Ade Albarracín,
7 Madrid
ct person: Toml address: tom
um
part of theuropean Coin the proje
OU) – Coordinnstitute – KMi ton Hall 7 6AA
artin Dzbor, Endzbor, e.motta
écnica de Madgancedo Monte
unción Gó[email protected]
re ComponenValdivia 10
sús Contreras ntreras@isoco
e Recherche e (INRIA) e de l'Europe
Martin
rôme Euzenatome.euzenat@
nz-Landau (U1
effen Staab ab@uni-koble
(ONTO)
rgen Angele gele@ontopris
ATOS) , 25
más Pariente mas.parientelo
e NeOn resommunitiesect:
nator
nrico Motta a} @open.ac.u
drid (UPM)
z Pérez
nts S.A. (ISOC
o.com
en Informatiq
@inrialpes.fr
KO-LD)
enz.de
se.de
Lobo bo@atosorigin
search projs by the gra
uk
UnInstBesEngD-7ConE-m
SofUhl642GeConE-m
CO) InsJamSI-SloConE-m
que UnDepReg211S14UniConE-m
CoInstVia44 ConE-m
Fooof tVia001ItalyConE-m
n.com
LabC/C080SpaConE-m
ject fundedant number
iversität Karltitut für Angewschreibungsveglerstrasse 1176128 Karlsruhntact person: mail address: p
ftware AG (SAlandstrasse 12297 Darmstadrmany ntact person: mail address: w
stitut ‘Jožef Smova 39 1000 Ljubljana
ovenia ntact person: mail address: m
iversity of Shpt. of Computegent Court 1 Portobello st4DP Sheffield ited Kingdomntact person: mail address: h
nsiglio Naziotitute of cognit
a S. Martino de- 00185 Romantact person: Amail address: a
od and Agricthe United Na
ale delle Terme100 Rome y ntact person: mail address: m
boratorios KICiudad de Gra018 Barcelonaain ntact person: Amail address: a
NeOn Integra
d by the ISIST-2005-0
sruhe – TH (Uwandte Informerfahren – AIF he, Germany Peter Haase [email protected]
AG) 2 dt
Walter Waterfwalter.waterfe
tefan’ (JSI)
a
Marko Grobelmarko.grobeln
heffield (USFDer Science
treet
Hamish Cunnhamish@dcs.
onale delle Ritive sciences ella Battaglia, a-Lazio, Italy Aldo Gangemaldo.gangemi@
ulture Organations (FAO) e di Caracalla
Marta Iglesiasmarta.iglesias
N, S.A. (KIN) anada, 123 a
Antonio Ló[email protected]
ated Project EU
ST Program027595. Th
UKARL) atik und Form
FB
-karlsruhe.de
feld eld@softwarea
lnik [email protected]
D)
ningham shef.ac.uk
cerche (CNRand technolog
mi @istc.cnr.it
ization
1
z s
U-IST-027595
mme of thehe following
male
ag.com
R) gies
5
e g
D3.2.4 Context-sensitive Search of Ontologies Page 3 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
Work package participants
The following partners have taken an active part in the work leading to the elaboration of this document, even if they might not have directly contributed to the writing of this document or its parts: JSI, OU.
Change Log
Version Date Amended by Changes
0.1 20-07-2009 Boštjan Pajntar Overall structure of the report
0.2 01-08-2009 Boštjan Pajntar Executive Summary, introduction
0.3 03-08-2009 Boštjan Pajntar Began approach description chapter
0.4 15-08-2009 Dunja Mladenić Chapter 2, Overall revision
0.5 28-08-2009 Boštjan Pajntar Chapter3
0.6 12-09-2009 Marko Grobelnik Overall revision
0.7 28-09-2009 Boštjan Pajntar Figures, Overall revision
0.8 15-10-2009 Christopher Buttenshaw Final QA
Executive Summary
This report is describing a software deliverable developed as an extension of the web application SearchPoint [Pajntar and Grobelnik, 2008]. The main goal of SearchPoint is to enhance search engines by allowing the users to get multiple rankings of the results for each query. We achieve this by generating topics for the given query and its result set and visualizing these topics on a panel named “Ranking Space”. Each point in this ranking space maps to specific ranking. For example, if a point is selected near one topic, results that are on that topic are ranked higher.
The topics can be generated by clustering of the results and we have implemented this method as a baseline to compare with. The more advanced method for generating topics is the classification of hits and query into a selected ontology and then selecting the concepts with most results to serve as topics. This allows for visualization of a small enough number of topics that can be understood by the user on the one hand whilst retaining as much of the domain covered by the current result set.
Topics must be visualized in an intelligent way. Since the selection of a point in between two topics promotes hits that cover either of them, it makes sense to visualize similar topics close together. In the baseline scenario of taking centroids of clusters for the topics, they are visualized by drawing a complete weighted graph, each node representing a topic and edges being weighted by the similarity of the two nodes. In the scenario of classifying and selecting most prominent concepts from the ontology, we visualize in accordance to the underlying structure of the ontology.
This is the last of the three core contextualizing technologies stemming from WP3. It provides means for study of any non-structured, general knowledge, in a context of a selected ontology. The other two core contextualizing technologies are complementary: the context can be provided by one networked ontology for the other. This was implemented in OntoConto (D3.2.2, D4.5.2), consuming Alignment Server (D3.3.1, D3.3.2) and second, a general background knowledge can provide means for the contextualization of an ontology, which was implemented in OntoAtlas (D3.7.1, D4.3.1, D3.7.2)
Page 4 of 40
Table of1. Introduc2. Approa2.1 System2.2 Topic g2.3 Topic g2.4 Visuali2.5 Calcula
3. Exampl3.1 Basic d3.2 Discus3.3 Showc
4. ConclusAppendix A.1 The cu
f Contenction .........ch Descrip
m architectugeneration –generation –zation and ation of the
e usage ofdescription ssion on Usacase ...........
sion and fuA ..............
urrent WSD
nts ..................ption .........ure .............– Clustering– Classificaranking sparanking spa
f the systemof the functability of Se.................
uture work ..................L file to the
..................
..................
..................g ................ation ...........ace ............ace ............
m ...............tionalities ...earchPoint ...................
................................... web servic
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................ce ...............
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
NeOn Integra
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
ated Project EU
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
..................
U-IST-027595
............. 6
............. 7
............... 8
............... 9
............... 1
............... 1
............... 1
............. 17
............... 1
............... 1
............... 1
............. 24
............. 25
............... 2
5
0 2 6
7 7 9
5
D3.2.4 Context-sensitive Search of Ontologies Page 5 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
List of Figures
Figure 1: illustration of basic SearchPoint in action. For a query “ontology” ambiguous hits get returned. To see only the hits in the context of philosophy the red focus point is moved near the automatically generated topic “philosophy”. Hits returned are initially low ranked (99, 11, 68...) but all talk about philosophy, logic, Aristotle... 7
Figure 2: The same result page, only the focus point is moved between topics “metadata”, “owl”, “online”. The user immediately gets returned a list of hits about i.e. semantic mark-up, information and computer science. 8
Figure 3: Architecture of SearchPoint 9
Figure 4: The hand cursor is above "Philosophy" topic. The additional words are: exist, concerns, kinds, part, nature. 13
Figure 5: Classifier method for the query “ontology”. The ontology (taxonomy) used is DMOZ. The relevant concepts classified are: Philosophy, Knowledge Management, Social Sciences, Languages, Internet and Artificial Intelligence. The four main categories are Society, Reference, Science and Computers. Mouse over the concept “Internet” shows Top/Computers/Software/ Internet. The concept Software was left out of the visualization in order to make it clearer. 15
Figure 6: Visualization for the query “semantic web”. Immediately it is seen that this topic is mostly modelled with Computers and its descendants. Additionally there is some business and Knowledge management linked to it. 15
Figure 7: Yahoo web search in the context of Dmoz. 20
Figure 8: Yahoo web search with clustering. 21
Figure 9: Yahoo web search in the context of EuroVoc. 22
Figure 10: Swoogle Ontology Search in the context of EuroVoc. 23
Page 6 of 40
1. Introd
Big searchfriendly mconstitutesthe correctis no apparanking anthe rankingsetting. ThHowever, corporate solution fo
Some websubcategohits and awww.vivisi
On mindseof his que"shopping"
SearchPoitopics. Wequery and on how the
The main pre-trainedontology, wthe classif2.2).
Since thertwo topics user selecsimilar toptopics if he
The remaoverview owe offer soscreenshoin the NeO
duction
h engines ethod. A u
s the enginet result-set
arent best wnd a lot of feg should be
he rankings the rankingor specializr rankings.
b applicatiories of the
a user can mo.com, ww
et.research.ery, he can" or of "rese
nt web appe have exten
result set. Te user can s
work of thisd OntoLightwhich in turfication into
re are moreit was easy
cts. Here, wics are visu
e is intereste
inder of thof the architome discusts (Section
On toolkit (S
(i.e. www.guser must ee’s input, ais a well kn
way of calcueatures aree personaliare at least
g quickly bzed (image
ons build oresult-set. Treformulat
ww.clusty.c
yahoo.comn tune the earching". T
plication bunded the idThere are aselect his pr
s deliverablt Classifier rn provides
o ontologies
e than two ty since a simwe must usualized closeed in results
his deliveratecture, befossion about3.3). In theection 4).
google.comenter a qu
at which ponown probleulating ranke used for tzed. Usualt initially calecomes wo search, on
on top of These categte his quercom, www.k
m another apranking by
This web app
ilds on theea in a sen
also an arbireferences.
le lies in ex[Grobelnik e
s best conces, we also
topics, it is mple slider se the similer than nons featuring t
ble initiallyore describt the possibe end we di
, www.yahouery in the int, the eng
em with a wking. A lot ohis processly results alculated froorse if therntology sea
the usual gories are ry by clickikartoo.com,
pproach is py defining plication ga
idea of rense that topitrary numb
xtending Seet al., 2008] epts for theuse cluster
not trivial tbar sufficedarity of top
n-similar. Thtwo similar
describes ing topic ge
ble usage (Sscuss on fu
oo.com) usform of ty
gine returnswell known s
f resourcess. There is are very effm the undere is no unarch) search
search mein one way ng on themwww.ujiko.
presented. Aif results s
ave us the in
-ranking hitics are not er of topics
earchPoint w for the us
e visualizedring of resu
to position td for determpics to drawhis enables topics with
the approeneration anSection 3.2uture work t
NeOn Integra
se a simpleyped wordss a ranked solution. Ons are spent continuous
ficient in theerlying graphnderlying sth scenarios
ethod. It isor another
m. Example.com.
After the usshould be mnitial idea fo
ts in accordpredefined
s per query,
with the pose of classifd topics (Seults for topi
them on thmining how mw a graph othe user to a single clic
oach (Sectind visualiza
2) and showthat will inc
ated Project EU
e, effectives or sentenresult-set.
n the other hon calculatdebate on
e general wh of linked wructure. Fos do not ha
s possible r presented es of such
ser receivesmore in theor our appro
dance to thbut depend providing a
ossibility of fication of h
ection 2.3). ic generatio
e ranking smuch of eacof topics. Iselect poin
ck (Section
on 2), firstation techniqwcase the scorporate cu
U-IST-027595
and user-nces whichCalculatinghand, thereting optimalhow much
web searchweb pages.
or example,ave a good
to identifybeside thesite's are:
s the resultse sense ofoach.
he selecteddant on thea challenge
adding anyhits into anApart from
on (Section
space. Withch topic then this way,
nts between2.4).
t giving anques. Then
system withurrent effort
5
-h g e l
h h . ,
d
y e :
s f
d e e
y n
m n
h e ,
n
n n h t
D3.2.4 Context-sensitive Search of Ontologies Page 7 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
2. Approach Description
SearchPoint in essence is a search engine add-on. This means it needs a search engine it can consume to provide additional functionality. The basic search functionality, input query, output ranked result-set, is unhindered. In the beginning a user has to provide a query and a ranked result-set is returned. Besides this, topics of interest are calculated. How this is done will be explained later, for now, all we have to know is, these topics correspond to the search query and top portion of the result-set.
Topics are placed on a plane as part of a graphical user interface (GUI) in such a manner that similar topics lay close. A focus point is placed at the origin. This focus can be moved by either dragging it to a desired location or alternatively a point on the plain can be clicked in order for the focus to move there. Each position of this focus corresponds with one ranking.
For example a user who clicks near one topic will get hits ordered mostly by the sense of that topic, if the focus is moved to a position between two topics, results that share similarity with both these topics will tend to be higher ranked. In truth, at any moment all the topics influence the ranking; however influence decreases with the distance between topic and focus point. For an illustration of usability see Figure 1 and Figure 2.
This approach can be used with any search engine that provides textual results. However, the problem of a general web search has been mostly solved, as underlying graph of linked web sites offers a lot of information for the importance of a single node – web page. On the other hand, our approach is very useful in the search scenarios without an underlying graph structure. For example, corporate search engines must work on a relatively small site with a mostly tree like structure, yet important content could be anywhere on this graph. Another example is that of prolific search scenarios, for example image search or ontology search.
In this deliverable, we have concentrated on the ontology search scenarios. There are several ontology search engines, so we tested our approach on swoogle.umbc.edu and google.com search with the defined result type as .owl or .rdf. This work will be integrated with Watson search [d'Aquin 2008] and NeOn toolkit platform as part of effort in WP4.
Figure 1: illustration of basic SearchPoint in action. For a query “ontology” ambiguous hits get returned. To see only the hits in the context of philosophy the red focus point is moved near the automatically generated topic “philosophy”. Hits returned are initially low ranked (99, 11, 68...) but all talk about philosophy, logic, Aristotle...
Page 8 of 40
Figure 2: “online”. Tand compu
2.1 Syste
The Searcto for examof process
The implemodule, fodistributionchained int
1. Age
2. Seasni
3. Aut
4. Graweb
5. Grarea
Some com
In the currin order to Next, modapplicationand manipToolkit) or
The same he user immuter science
em archite
chPoint is a mple on thees, which w
mentation or examplen channel. to a pipeline
ent (usually
arch engineppets
tomatic topi
aph drawingb service)
aphical Usearranging th
mments on t
rent applicaminimize th
dules 1 ann. It seems pulating resu
to separate
result pagemediately ge.
ecture
tool for seae web. In facwe will call m
of SearchP, to changThe architee (Figure 3)
y user) prov
e processes
ic generatio
g and sub-g
er Interface e results dy
he modules
tion, the mohe number d 5 are a natural to
ults. It is, hoe them (for u
e, only the fgets returne
arching. Hoct the usuamodules.
Point is moe a searchecture of S):
ides a quer
s the query
on module (
graph extra
(GUI) for thynamically.
s:
odules 3 anof calls madcomponen
provide theowever, veruse in an ev
focus point ed a list of h
owever, thisl search en
odular as thh engine, aSearchPoint
ry (usually in
y and return
implemente
action for vi
he visualiza
nd 4 are ende; howevent of the se user with ry easy to cven bigger s
is moved bhits about i.
s is not the gine is mer
his makes adopt a newt (Fig 1) co
n a GUI)
ns a result-
ed as a web
isualization
ation of the
nveloped inser, if neededsame GUI a single pla
change the solution).
NeOn Integra
between tope. semantic
usual searcrely one mo
it very easw classifieronsists of t
-set of shor
b service)
of the topi
topics, foc
side a singld, they coulprovided inace for posdistribution
ated Project EU
pics “metadc mark-up,
ch we are aodule in a lo
sy to changr, and provthe followin
rt textual do
cs (implem
us point se
le web servd be easily n the formsing queries channel (i.
U-IST-027595
data”, “owl”,information
accustomedonger chain
ge a singlevide a newng modules
ocuments –
mented as a
election and
vice. This isseparated. of a webs, receivinge. to NeOn
5
n
d n
e w s
–
a
d
s .
b g n
D3.2.4 Context-sensitive Search of Ontologies Page 9 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
Figure 3: Architecture of SearchPoint
2.2 Topic generation – Clustering
The most obvious way to automatically generate topics out of a corpus of documents is to cluster the available results into a predefined number of clusters and use a centroid or medoid of each cluster as an individual topic. The technical details of the implementation follow below.
Document clustering (Steinbach et al., 2000) is based on a general data clustering algorithm adopted for textual data by representing each document as a word-vector, which for each word contains some weight proportional to the number of occurrences of the word (usually TFIDF weight as given in equation (2.1)).
)(log)(),(),()(
iiii
i
WDFDWwhereIDFWIDFdWTFd == (2.1)
Where D is the number of documents; document frequency DF(W) is the number of documents the word W occurred in at least once; and TF(W,d) is the number of times word W occurred in document d. The exact formula used in different approaches may vary somewhat but the basic idea remains the same – namely, that the weighting is a measure of how frequently the given word occurs in the document at hand and of how common (or otherwise) the word is in an entire document collection.
Page 10 of 4
The similavector repdocuments
Several cluhave selecwhen retur
The basic clustering execution points to proximity wour approaiteratively change in as a parammeans algassignmenindepende The calculwords whicrepresent tread with s The webse http://searc More inform
2.3 Topic
The clustetopics. Theimportant necessarily
Therefore,clustering, human and
Instead of given ontobesides bea given inte
Any knowlcontextualwith the sproviding dwhole corp
0
rity of two dpresentations based on
ustering algcted k-meanrning results
k-means calgorithms and apparebe the ce
with regardsach, becaurecomputedcentroid po
meter in ourgorithm is runts are deteent clusterin
ated centroch are partithe cluster some intera
ervices for t
chpoint.ijs.s
mation is gi
c generatio
ering approae main disawords founy also the m
we choosewe classify
d can theref
classifying logy. Such
eing used aerest of the
edge represized with thame querydifferent ranpus.
documents ns of the dtheir simila
gorithms canns becauses and topics
clustering ato be appl
ent suitabilntroids of s to a certaise they ared for each ositions. Thr web-servicun ten timeermined ong results.
oids are actuicularly promthat is visu
action.
his method
si/Classifier/
ven in the A
on – Class
ach is veryadvantage hnd in the
most informa
e a differenty each docfore also be
into an arba classifica
as a query re user.
sented in ahe use of d, different o
nkings of the
is commonocuments (rity, putting
n be used oe of its highs to the use
algorithm [Klied to text,ity to textuclusters, an distance/e invariantcluster, anis algorithmce. We choses with diffen the basis
ually vectorminent for t
ualized in th
and other t
/WS_Class
Appendix.
sification
y efficient inhowever, is centroid thative to the
t approach.cument into e understoo
bitrary list ofation providrefinement t
textual cordifferent ontontologies e hits. Each
ly measure(see equatsimilar doc
on TFIDF reh speed, siners.
Kanungo et , which still
ual data disand groupin/similarity mto the leng
nd documenm depends hse random
erent randoof best in
rs in TFIDF the whole che GUI and
topic genera
ify.asmx
n separatingthe presen
hat define user.
. Instead ofa preset li
od by a hum
f topics we es a deepetool, also be
rpus and thetologies. Onreturn diffe
h ranking ca
d by the coion (2.2)).
cuments in t
epresented nce the clus
al 2002] isl produces stributions. ng docume
measure. Wegth of eachnts regroupheavily on tinitial positim positionstra – inter
space, so ecluster. From
we also lis
ation metho
g the documntation of thand separa
f automaticaist of topics
man.
can do mucer understane used to in
erefore beinn the sameerent topicsan be consi
NeOn Integra
osine-similaThe clustethe same gr
documentsstering mus
s one of thgood resulIt involves nts arounde have chosh documentped until thethe choice oioning of ces of initial cratio for cl
each centrom this we dst several to
ods can be
ments into hese topics ate the top
ally generats that have
ch better wnding of avanvestigate t
ng searchabe data set ts the user cidered as a
ated Project EU
rity betweenering algoritroup.
. For our apst be done
e oldest anlts becauserandomly
d centroids sen cosine t. Then, ceere is sufficof k, which
entroids. Thcentroids. Fluster simila
oid has a raderive the bop words th
found at:
a selected to the user
pic the mo
ting the topbeen gene
with classificailable topicthe actual s
ble, can in tthat is beingcan naviga different vi
U-IST-027595
n the word-thm groups
pproach weat run time
nd simpleste of its fastchoosing kbased on
similarity inentroids areciently littlewe specify
he whole k-Final clusterarity of the
anked list ofest word to
he user can
number ofr. The mostost are not
ics througherated by a
ation into acs and can,subtopics of
this way beg searchedte through,iew into the
5
-s
e e
t t k n n e e y -r e
f o n
f t t
h a
a , f
e d ,
e
D3.2.4 Context-sensitive Search of Ontologies Page 11 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
2.3.1 Text Classification Text classification can be applied when a set of predefined categories (classes), such as “arts, education, science”, are provided as well as a set of documents labelled with those categories. The task is to classify new (previously unseen) documents by assigning each document one or more content categories. This is usually performed by representing documents as word-vectors (usually referred to as the ‘bag-of-words’ representation) and using documents that have already been assigned the categories, to generate a model for assigning content categories to new documents. In the word-vector representation of a document, a vector of word frequencies is formed taking all the words occurring in all the documents (usually several thousands of words). The representation of a particular document contains many zeros, as most of the words from the collection do not occur in a particular document. The categories can be organized into an ontology, for example, the MeSH ontology for medical subject headings or the Yahoo! hierarchy of Web documents that can be seen as a topic ontology. Other applications of document categorization into hierarchies/taxonomies are of US patents, Web documents (McCallum et al., 1998; Mladenić, 1998; Mladenić and Grobelnik, 2003), and Reuters news articles (Kholer and Sahami, 1997).
Cosine-similarity that is commonly used in document clustering can be also used for document classification as follows. Given a new document, cosine-similarity is used to find the most similar documents (e.g., using k-Nearest Neighbour algorithm (Mitchell, 1997)). Cosine-similarity between all the documents and the new document is used to find the k most similar documents whose categories (topics) are then used to assign categories to a new document. For documents id and dj , the similarity is calculated as given in equation (2.2). Note that the cosine similarity between two identical documents is 1 and between two documents that share no words is zero.
(2.2)
2.3.2 OntoLight OntoLight [Grobelnik et al., 2008] is a software suite which implements basic reasoning functionalities for contextualized ontologies. It is limited to light-weight ontologies which are grounded with appropriate text corpora. The representation and reasoning scales to the largest currently available ontologies, comprising up to one million concepts. In particular, OntoLight currently incorporates the following five ontologies: AgroVoc and ASFA (relevant for the Food and Agricultural Organization of the UN), EuroVoc (EU legislation), Cyc (common-sense knowledge) and DMoz (a WWW directory).
There are two basic reasoning mechanisms implemented in OntoLight. First, new textual instances without a known class can be classified into the selected ontology. Second, soft (probabilistic) mappings between a pair of selected ontologies can be computed, thus providing a contextual relationship between the ontologies.
OntoLight was used as a basic building block for extensions to OntoGen [Fortuna et al., 2006], where contextual mappings are used to improve semi-automatic construction of light-weight ontologies from text corpora. The same mechanism of contextual reasoning will be used to extend OntoGen to support simultaneous, collaborative development of an ontology. Soft mappings between grounded ontologies also complement methods for ontology alignment, where mappings are computed on the basis of common, background ontologies (as provided by Swoogle, for example) [Sabo et al. 2008]. The main functionality we cover is the contextualization of ontologies through generation of soft mappings between ontologies, thus enabling us to view concepts of one ontology through the perspective of another one. OntoLight also supports the scalability needed for large case studies – i.e. being able to deal with large ontologies such as AgroVoc and ASFA. To
∑ ∑
∑=
l mjmil
kjkik
jidd
dddd
22),cos(
Page 12 of 4
achieve thtargeted fu
For the usetrained witclassifier. query and
Since ontochoose thomany docuclassifying similaritiesbut is not concept wtake into aranked list
2.4 Visua
Our main such a wafocus poin
Once we athe focus space.
2.4.1 GrapThe input using the cthe k-meaten. Whenselect the d
While the we use it o
To visualizbetween toactually re
In order torepresentswe can als
To draw a which is vsecond. W
0
his, the repunctionality
e of topic gth OntoLighWe can ususe them to
ologies can ose concepuments are a docum
s. In order nreally the
we sum all taccount thos
of concepts
alization an
goal is to vay that similt by draggin
achieve thispoint. Since
ph drawingfor the visu
clustering mns algorithm using Ontodesired top
visualizatioonly for clus
ze these topopics also presented a
o position s a node anso weight th
graph we uvery robust
We restate so
presentationneeded in t
generation inht. OntoLigse this claso provide to
be huge anpts that havee classified ent into anot to miss most impothe similaritse similaritis which get
nd ranking
visualize aular or relateng a red poi
, it is possibe each poin
g ualization is
method we cm. In SearcoLight class n (default t
n describedstering. The
pics we neerequires thas vectors s
or draw thd similarity e nodes by
use Fruchteand for su
ome of the
n is constrathe case stu
n SearchPoht is a tool
ssifier to finopics.
nd we can e the most into each
n ontologyany conce
rtant one fties of simies that aret to be used
g space
utomaticallyed topics laint that is al
ble to calcunt on the p
s always acan select tchPoint we sifiers, we gten).
d below couchanges fo
ed to positioat similar toso we can c
e topics, wbetween tw
y the classifi
erman-Reinuch a smalobservation
ained to a udies.
oint we will l that easily
nd the conc
actually visdocumentsconcept we
y, OntoLighept that actufor any, welar docume above a p
d in the visu
y generatedy close toglso part of th
late new raanel maps
list of topihe desired leave this
get a long l
uld be usedor the classi
on them. Oopics lay c
calculate the
we model two topics giier score of
ngold (FR) al graph of ns in [Pajnta
light-weigh
be able to uy transformcepts that a
sualize onlys classified e rather fo
ht can actuually covers
e rather useents. Similaarticular thralization an
d topics andgether. Aparhe GUI.
nkings of thto a specif
ics, which anumbers ofnumber as list of ranke
d for both cifier will be d
ur applicaticlose togetheir cosine s
hem as a ves a weigthe topics.
algorithm [Faround ten
ar 2006].
NeOn Integra
ht ontology
use any of ms any grouare most re
y so many dinto. Insteallow a diffeually provids many doce these simar documenreshold. In nd GUI.
d position trt from this
he documenfic ranking,
are presenf topics-cena paramet
ed classes-
clustering adescribed in
on of beingher. In bothimilarities.
full weighthted edge.
Fruchtermann nodes wo
ated Project EU
y model wh
the classifieunded ontoelevant for
distinct topicd of just co
erent approde documecuments in milarities annts mean ththis way we
them on thethe user c
nts for eachwe call it t
ted as vecntroid, by seter which isconcepts a
nd classifien the next s
g able to seh cases the
ed graph. In the class
n and Reinorks in a fr
U-IST-027595
hich covers
ers that areology into athe current
cs, we onlyounting howach. Whenent-conceptthe corpus
nd for eachhat we onlye can get a
e screen inan select a
h position ofthe ranking
tors. Whenelecting k ins by defaultnd we also
er methods,subsection.
lect a pointe topics are
Each topicsifier model
gold 1991],raction of a
5
s
e a t
y w n t s h y a
n a
f g
n n t
o
,
t e
c l
a
D3.2.4 Context-sensitive Search of Ontologies Page 13 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
In this algorithm only two criteria are demanded for a good graph drawing:
• Connected nodes should be close.
• No two different vertices should be too close.
There is no penalization for edge crossings in this algorithm. Edge crossings are very important for clear drawing of a graph in general; however, we mostly need the final position of the nodes and graph structure is not very interesting since we have a full graph. There would also be a heavy computational penalty in calculating all the edge crossings (n4) in every iteration.
In every iteration of the algorithm we calculate all the attracting forces from connected vertices, and the repulsing forces from all the nodes. Since there is no scalar penalty for edge crossings, the result is a vector, which points out not just how much out of place the vertex is (vector size) but also into which way it should move. The size of the actual move is confined to current temperature, which is linearly decreased in every iteration.
This very simple algorithm provides excellent results. Its main advantage is its speed and robustness. Even today, years after its creation, it is still one of the most popular algorithms for graph drawing.
After we get positions we can visualize the topics on the screen. We also visualize the most important word or bigram (a common pair of words) on top of each topic. Since for the clustering method there is a possibility of a poor word on the top place, we also visualize some additional top ranking words in a tooltip on a mouse over (Figure 4).
Figure 4: The hand cursor is above "Philosophy" topic. The additional words are: exist, concerns, kinds, part, nature.
Page 14 of 4
2.4.2 Sub In the clasHowever, wproblem issolution toconnectedvisualized.
From the everything extract thiscase whertop conceprocess is
To visualizto visualizethree levelwith focus
For the firsontology mcomputed However, wthe secondontology. Tis especial
The actuaconsecutivnodes get visualized
Some visuvisualized descendanvisualizatiosubpart of instances dAll of this c
0
Graph Extssifier modewe would li
s that we cao work on subgraph
given set to work e
s subgraphre the graphpt in everya connecte
ze this we fie three levels are too mpoint selec
st level we models the
relevant towe found it d level we This informsly useful wh
al visualizatve level on cspace propand represe
ual aides haas the siz
nt conceptson; howevethe ontologdo. On moucan be seen
raction el we couldke to retainan only visuany ontolothat conta
of relevant ven on tax. Since the h provided by connecteded tree, reac
irst need soels of this tremuch informtion.
choose thecurrent qu
opics. In theimportant tchose to v
s the user ohen the roo
tion is straconcentric c
portional to tent the anc
ave also bee of nodes
s in the oner, we foundgy models thuse over a sn on Figure
d easily adon some of thualize a sm
ogy even oains all the
concepts xonomies w
only relatioby the SubCd part to aching all the
ome sort of ee. We wanmation for t
e root of theery and itse ontology to unify the visualize thof which pat topic is als
ightforwardcircles. Eacthe third levestry relatio
een implems and for thntology. Acd it providehe current rsingle topic5, 6.
opt graph dhe structure
mall numberon simple te relevant c
we must fwe must uson in treelikConceptOf virtual nod
e relevant c
abstractionnt to visualizthe user an
e tree. This results. Fothese concvisualizatio
he actual srt of the onso the root o
d. We positch third levevel nodes thon between
mented in Ghe second
ctual score es more infresults in coc the ancest
drawing to ve that is inhr of topics faxonomiesconcepts a
first extract se a very bke taxonomrelation is nde we namoncepts tha
n. After quaze as muchnd also neg
s basically sor the thirdcepts are n
on of the resecond levetology the tof the ontol
tion the roel node hashey parent. nodes.
GUI. The aclevel nodefrom the c
formation toontrast to hotor list of th
NeOn Integra
visualize thherent in thefor one que. This is w
and then m
a subgrapbasic relatioy is the subnot connect
me “Top”. Tat provide to
alitative ana as possible
gatively affe
shows the level we c
not generalsults of the el below thethird level toogy, or the
ot at the os equal spaThe links b
ctual score es we sum classifier coo visualize ow much a e concept i
ated Project EU
e best topie ontology mery. We wowhy we firsmodify it so
ph. Becauson from whbsumtion reted, we con
The end reopics.
lysis we hae, however,
ect the rera
user which chose to vily on the sclassifying e root concopics belonvirtual “Top
origin and ce, while seetween leve
of third levthe scores
ould be ushow muchsingle concs written in
U-IST-027595
cs-classes.model. Theuld like our
st extract ao it can be
e we wanthich we willelation, in annect everyesult of this
ave decided, more thannking done
part of thesualize the
same level.phase. For
cept of theng to, whichp” node.
draw eachecond levelels are also
vel topics iss of all thesed for theh the wholecept and its the tooltip.
5
. e r a e
t l
a y s
d n e
e e . r e h
h l
o
s e e e s
D3.2.4 Context-sensitive Search of Ontologies Page 15 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
Figure 5: Classifier method for the query “ontology”. The ontology (taxonomy) used is DMOZ. The relevant concepts classified are: Philosophy, Knowledge Management, Social Sciences, Languages, Internet and Artificial Intelligence. The four main categories are Society, Reference, Science and Computers. Mouse over the concept “Internet” shows Top/Computers/Software/ Internet. The concept Software was left out of the visualization in order to make it clearer.
Figure 6: Visualization for the query “semantic web”. Immediately it is seen that this topic is mostly modelled with Computers and its descendants. Additionally there is some business and Knowledge management linked to it.
Page 16 of 4
2.5 Calcu
Once the tto each toruntime whthis must b
When the search engvirtual invispoint. We observe th
When the position ofsimilarity mand invers
The set T Euclidian dhits get ord
0
lation of th
topics are ppic, we muhen the usebe done effi
SearchPoingine, as wesible node also add de original ra
focus poinf the focusmatrix S(i, j)ely proporti
of all topicsdistance fordered and v
he ranking
positioned ast calculate
er actually mciently.
nt starts, wee do not wand positio
document –anking.
t moves, ws f(x, y), po). The scoreonal to the
s tj also conr the dist(f, visualized b
score (
space
and visualizee the rankinmoves it. Wh
e also want want to hindon it in the – virtual no
we must calosition of toe for each distance to
tains the intj) and we y this scorin
∑=t
id )(
ed and all tng space. Ehen draggin
to maintainder the basorigin – the
ode similarit
culate a scopics ti(x,y)document i
o that topic.
visible nodeadd a smang function.
∑∈Tj
dist
the documeEach selectng, this hap
n the originasic functionae same as ties to docu
core for eac), documenis proportio
e that provill ε in order.
jtfjiS),(),(
NeOn Integra
ents are equted focus popens tens o
al ordering ality. Becauthe startinguments in
ch documennts di, and onal to the s
des the origr to prevent
+ ε))
ated Project EU
uipped withoint gets caof times per
that is provuse of this,g position osuch a wa
nt. We firstDocuments
similarity of
ginal rankint division by
U-IST-027595
similaritiesalculated inr second so
vided by the, we add aof the focusy that they
t define thes – Topicsf each topic
ng. We takey zero. The
5
s n o
e a s y
e s c
e e
D3.2.4 Context-sensitive Search of Ontologies Page 17 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
3. Example usage of the system
In this section we will briefly describe the basic functionalities of the prototype (Section 3.1), then delve a bit deeper in the scenarios where the SearchPoint benefits the users (Section 3.2) and in the end we will showcase one use case accompanied with the screenshots (Section 3.3)
3.1 Basic description of the functionalities
GUI consists of an input field where the user can provide the query. The user can submit a query in several ways. Each method for topic extraction is connected with one submit button. Apart from this, there is a result list and the ranking space, where the user will be able to choose the focus point.
The search engine is selected by the URL of the web application. For the testing in this deliverable the following search engines have been used:
• Yahoo: http://searchpoint.ijs.si/
• Swoogle: http://searchpoint.ijs.si/swoogle/
• Google ontology search: http://searchpoint.ijs.si/googleowl/
After the user inputs the query and selects the method for topic extraction the result list is returned by the current search engine. At the beginning the results are ranked the same way as in the search engine. Each result item is visualized together with the original ranking. At the start this is therefore (1, 2, 3, 4…). The user can now inspect the topics for the current search and select any point on the ranking space or drag the focus point around the ranking space. The hits get reordered in real time, so the user is able to pinpoint a good selection for the focus easily, by just observing the quality of the displayed results.
There is also a history of the positions of the focus point. In case the user has found a good position and then moved the focus to see other rankings it is easily possible to navigate to a previous position by clicking on one of the three buttons of the history bar.
In the history bar there are three positions:
• Back: for returning the focus to one step back
• Forward: to move it to the next position
• Origin: to clear the history and return to default ranking provided by the search engine.
The buttons are only visible when they are available. For example the Forward button is not visible until Back has been clicked. No button is visible in the start, or upon clicking origin button, since there is no history yet at that time.
3.2 Discussion on Usability of SearchPoint
SearchPoint provides several benefits for the user. Here we will discuss and give examples for many of them. However, first we would like to point out that the basic functionality of the underlying search engine is not hindered in any way. For example, if the search engine provides adequate top ranked results for the query, the user can immediately follow that information without interacting with SearchPoint. SearchPoint provides additional functionalities at no extra cost to the user.
Page 18 of 4
3.2.1 QuerThe most very short for exampl
This can bcase it is vwith this prvariety of providing sthe formerfrom. Howe
SearchPoitopics, whihits to theadvantageof the meaThe user mthe focus p
3.2.2 Sub-Even wherelated to provided bsubtopic avisualized of the subt
For an exmethod ret
• “Pa
• “Ad
• “En
• “Re
• “Fopro
• “Ch
Even for thsome subtas it is des
3.2.3 RankThe biggesis usually When the
0
ry disambigobvious usqueries, use: owl, onto
be a problemvery hard toroblem by rameanings some additir in the senever, as sim
nt actually ich usually e selected e is that the anings, whicmerely has tpoint.
-Topic expln the actuathe results
by the topicsand immedclose and t
topics.
xample of sturning the
assword Ma
dministrator
ncryption”...
ecovery”... f
orgotten Paocedure of w
hange Pass
he user whotopics in thesigned in ord
king in a gest advantaga bag of ucorpus is e
guation
se of Searcsually consiology, jagua
m when reso find the fanking the in the first ional wordsse that afte
mple as it is
serves ascontain thetopic, can user does
ch can causto recognize
loration al meanings. For an us in the raniately get dthe user can
such a subfollowing su
anager”... fo
r”... for tools
for the vari
for tools wh
assword”... what to do t
sword”... for
o is familiar e results. Seder to enab
eneral corpge of Searchunrelated doxtensive, re
hPoint is qsting only o
ar, a4 or kiw
sults of onefew results results of thten results
s that furtheer the refine to refine a
s semi autoe one mean
be seen not have to
se problemse the topic
of the queuninformed
nking spacedocuments n select the
-topic exploub-topics:
or the docum
s to modera
ious algorith
en losing a
also dealinhan tools.
r best practi
with the topearchPoint
ble the user
pus hPoint is crocuments, teturning a la
uery disamof a single wwi, the result
e meaning dof the othe
he less press. Another per separate ement, the query it is s
omatic quening the useby the use
o come up ws to a user or even onl
ery is not auser the
can be beabout it. W
e focus poin
oration, we
ments abou
te the pass
hms connec
password
ng with los
ces when d
pic it is somis extremelto select se
reation of ththe usual sarge numbe
mbiguation. word. In thets of both o
dominate ther meaning(sented meapossibility fthe meanin
user actualstill tasking
ry refinemeer wants aner as a rewith the cornew to the y recognize
ambiguousvery summneficial. WhWhen two t in betwee
e give query
t software f
words
cted with pa
st passwor
dealing with
metimes hardy useful wh
everal subto
he ranking ssearch engier of results
NeOn Integra
Users are ae case this wr all the me
he search s(s). Big webanings highefor the userngs. This aly has ten rthe user.
ent. The usnd the one definition orrect best wtopic or to
e the results
there can marization ohat is more,
subtopics n to get doc
y “passwor
for managin
asswords
rds, but mo
passwords
d to find thehen dealing opics with a
space. In a ines fail to s for a singl
ated Project EU
accustomedword is amb
eanings get
space, sinceb search ener in order tr is to refinpproach is results he c
ser is presclick re-ran
of the querword for the
a non-nativs, by random
be severaof the main, the user care related
cuments tou
rd” with the
ng password
ore in the
s
e best querywith subtop
a single click
general corproduce ae query, thi
U-IST-027595
d to writingbiguous as,returned.
e in such angines dealo present a
ne a query,better than
can choose
sented withnking of thery. Anotherseparation
ve speaker.mly moving
al subtopicsn subtopicscan select ad, they areuching both
e clustering
ds
context of
y to get justpic profilingk.
rpus, whichny ranking.is becomes
5
g ,
a l
a ,
n e
h e r n .
g
s s a e h
g
f
t g
h
s
D3.2.4 Context-sensitive Search of Ontologies Page 19 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
a real problem. The visualized ranking is usually nothing more than a randomized order of all the relevant documents.
Most of the special searches have this problem. This is the reason we have chosen ontology search engines to showcase our solution. Another example would be a repository of images. We have an example at: http://searchpoint.ijs.si/photo12. This is a repository of textually annotated images provided by a company Photos12 that sells them. SearchPoint provides a useful service by providing ranking, sub-topic profiling and also by quickly finding images with two motifs which are presented as two topics in the ranking space.
3.3 Showcase
In the showcase we will demonstrate the contextualization power of SearchPoint. The main goal is to contextualize search over a given background knowledge with the use of automatic topic generation techniques and re-ranking possibilities of the ranking space.
The scenario will be that of a user tasked with finding the most suitable ontology to model the domain of fisheries. The user can first get a general overview of the domain by profiling the sub topics of the general web search engine results.
On searchpoint.ijs.si yahoo search engine is used. By querying for “fisheries” and selecting the classification method into Dmoz, the user can get a basic model of what concepts are connected with fisheries in an everyday context of general public (Figure 7).
Next, the user can assess how well represented topics are in truth on the web. This can be achieved using a clustering method. As can be seen on Figure 8, several topics (Alaska, Lake) seem to be over represented. This can probably be explained by the great importance the fishing industry has for special regions. Science, the most represented topic from before, is much less prominent. This is mostly on account of the industry (products, processes, fisheries management) that was missing before.
For the same query, “fisheries”, there is an obvious context switch from the more scientifically oriented Dmoz to the more economical World Wide Web.
The last context the user can interpret the fisheries results with is that of a legal vocabulary of European Union (EuroVoc). As can be seen in Figure 9, there is a broad range of topics. There is a cluster of more environmental topics (Aquaculture, Fishing regulations and Fisheries policy), however, more in a governing context than a scientific one. Industry is also well represented (Fishing industry, Fishery Product, Fishery Produce).
The user now has a firm understanding of the domain and several ways of modelling. On searchpoint.ijs.si/swoogle he can access the Swoogle ontology search engine. The third method is chosen for contextualization. When changing the search engine, topics also change (Figure 10). This is because ontology search space is different than general web search. Topics are smaller, probably due to somewhat lacking descriptions of the result ontologies, also, the industrial concepts seem to dominate over environmental concepts in the ontology space.
The user can nevertheless search for the ontology more easily and with respect to the chosen context.
Page 20 of 4
Figure 7: YGeneral tomostly comsome notiois an enviro
The focus institutions
0
Yahoo web
opic of intereme from scon of regiononmental co
point is mos and scienc
search in t
est as proviience (Agrinal fisheriesoncern in th
oved to thece departme
the context
ided in the culture, Envs. Fisherieshe society.
e centre of ents are ran
of Dmoz.
Dmoz openvironment, also seem
the sciencenked on top
n directory. Biodiversity
m connected
e group anp.
NeOn Integra
The topics y, Earth Scd to recreati
d in the firs
ated Project EU
most relateciences). Thional fishing
st eight res
U-IST-027595
ed to fishinghere is alsog and there
sults mostly
5
g o e
y
D3.2.4 Context-sensitive Search of Ontologies Page 21 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
Figure 8: Yahoo web search with clustering.
Several prominent topics are extracted. There is a clear presence of economical topic (fisheries management, processing, products) and some more specific (Alaska, Lake).
The focus is moved towards Alaska topic, and there truly are many websites talking about Alaskan fisheries.
Page 22 of 4
Figure 9: YBoth indu(Aquacultumore in the
The focus ranging frbusiness (
0
Yahoo web
ustrial (Fisure, Fishing e context of
is moved trom enviro145, 22) an
search in t
hing Indusregulations
f a governm
to the envirnmental (1
nd fishing so
the context
stry, Fishes, Fisheries
ment.
ronmental p124 Philippocieties (49
of EuroVoc
ery Producs policy) asp
part of the pine enviro, 28) share
c.
ce, Fisherypects are w
panel near onment lawthe Aquacu
NeOn Integra
y Produce)well presente
the Aquacws, 17), eulture conte
ated Project EU
) and enved. Howeve
culture topicnvironment
ext.
U-IST-027595
vironmentaler, this time
c. Websitestally aware
5
l e
s e
D3.2.4 Context-sensitive Search of Ontologies Page 23 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
Figure 10: Swoogle Ontology Search in the context of EuroVoc.
The concepts in the ontology search space show most ontologies deal with the business oriented modelling. Also, the concepts are much smaller than before, because of lacking descriptions provided by the swoogle search engine.
Page 24 of 4
4. Conc
In this workcontext. Anselecting a clustering oresults for tspace ontol
Any textual retrieved. Hproblem haand due alscorpus of dindeed diffedifferent con
SearchPoinquery, provspace. All ththe ranking Only topics special cont
This is a finwill be consused by theengine, probwith the aut
0
lusion an
k we have exny general k
focus point or classificatithe user, thisogy.
backgroundHowever, whe
s been solveso to the redocuments or
erent rankingntext.
nt enables thide a one-clihis is providespace ontolthat are rele
textualized ra
nished workinsidered how e users. Thobably with ketomatic topic
nd future
xtended a seknowledge rein the rankinon into an os provides c
d knowledgeen the corpued for web-sdundancy ofr other moregs are neede
e user to geck query refoed by the stalogy and, threvant to bothanking space
ng prototypeto best integ
ought will alsey concepts extraction a
e work
earch engineepresented ing space. Raontology. Sincontextualiza
e can be indus being seasearch mainlf information
e specific seaed for differe
t a continuumormulation, oandard Searcrough this, ch the searche in which th
e available asgrate it insideso be given a(developed i
and actual ran
e add-on, Sen a textual anking spacence the diffeation of gene
exed and dorched is extely due to then on the webarch tasks went users an
m of rankingor can even bchPoint. On change the ch results andhe user can r
s a web appe the NeOn as to how it in WP4) and nking proces
earchPoint, inform can noe is created rent choice eral knowled
ocuments coensive, the pe underlying b. This appr
where there isnd even for t
gs for one qube used for stop of this, it
context in whd the currenterank the res
plication. ThisToolkit platfowill be integontology sim
ss.
NeOn Integra
n order to enow be easilyby visualizinof topic extrge with the
ontaining queproblem of go
graph structoach does ns no underlythe same us
uery which hesubtopic proft is possible fhich the rankt context aresults.
s work will corm, to be mgrated with Wmilarities (de
ated Project EU
nable searchy reranked bng topics steraction providselection of
ery words caood ranking ture of the linot work on ying graph stser when se
elps to disamfiling of the cfor the user king space is visualized a
continue in Wmost easily aWatson Ontoeveloped in W
U-IST-027595
hing within aby the user,
emming fromdes differentf the ranking
an easily beoccurs. Thisnking pagesan arbitrary
tructure, andearching in a
mbiguate thecurrent resultto exchanges calculated.and create a
WP4 where itnd intuitivelyology searchWP3) to help
5
a ,
m t g
e s s y d a
e t
e . a
t y h p
D3.2.4 Context-sensitive Search of Ontologies Page 25 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
Appendix A
Here we give some technical information about the SOAP/http webservice which does the automatic topic extraction and also provides the positioning of the topics with graph drawing. The WSDL specification of this web service can be found at http://searchpoint.ijs.si/Classifier/WS_Classify.asmx?WSDL and is also given in A.1.
The main methods are:
• ClassifyKMeans which does the clustering method for topic extraction and graph drawing for the positioning of the topics
• ClassifyDMoz which does the classification to DMOZ open directory for topic generation and positions the topics by extracting the relevant sub-tree.
• ClassifyEuroVoc which does the classification into EuroVoc Terminology extended with Acquis communautaire legislation documents that provide grounding.
The data is returned as a BowPartStruct structure, which contains three sub structures:
• Information about the topics, with relevant keywords or place in the ontology for each cluster and concept respectively is stored in Node structure.
• It also contains the similarities between each topic. This is stored in Links structure
• Next, all the results are given a vector of similarities to each topic. This is stored in Documents structure.
Several search engines can be called for the actual results:
• Big web search engines: Google, Yahoo, Bing
• Newspapers: NYTimes, About.com, Mladina (Slovenian)
• Ontology search: Watson, Swoogle, GoogleOWL
• Some specific search engines: EBay, Enron, CCA, Photo12
• It is also possible to provide results of any search engine:
o As a serialized xml: String
o As a url to such an xml: File
A.1 The current WSDL file to the web service
<?xml version="1.0" encoding="utf‐8"?>
<wsdl:definitions xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/" xmlns:tm="http://microsoft.com/wsdl/mime/textMatching/" xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/" xmlns:mime="http://schemas.xmlsoap.org/wsdl/mime/" xmlns:tns="http://searchpoint.ijs.si/Classifier" xmlns:s="http://www.w3.org/2001/XMLSchema" xmlns:soap12="http://schemas.xmlsoap.org/wsdl/soap12/" xmlns:http="http://schemas.xmlsoap.org/wsdl/http/" targetNamespace="http://searchpoint.ijs.si/Classifier" xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/">
<wsdl:types>
<s:schema elementFormDefault="qualified" targetNamespace="http://searchpoint.ijs.si/Classifier">
<s:import namespace="http://searchpoint.ijs.si/Classifier/BowPart.xsd" />
Page 26 of 4
<s:imponamespace
<s:eleme
<s:com
<s:seq
<s:el
<s:el
<s:el
<s:el
<s:el
</s:se
</s:com
</s:elem
<s:simpl
<s:restr
<s:enu
<s:enu
<s:enu
<s:enu
<s:enu
<s:enu
<s:enu
<s:enu
<s:enu
<s:enu
<s:enu
<s:enu
<s:enu
<s:enu
<s:enu
<s:enu
<s:enu
</s:rest
</s:simp
<s:eleme
<s:com
<s:seq
<s:el
<s:c
<s:
<s
</s
</s:
</s:e
0
rt schem="http://sea
ent name="C
mplexType>
quence>
ement minO
ement minO
ement minO
ement minO
ement minO
equence>
mplexType>
ment>
eType name
riction base=
umeration va
umeration va
umeration va
umeration va
umeration va
umeration va
umeration va
umeration va
umeration va
umeration va
umeration va
umeration va
umeration va
umeration va
umeration va
umeration va
umeration va
triction>
pleType>
ent name="C
mplexType>
quence>
ement minO
complexType
:sequence>
s:any names
s:sequence>
:complexTyp
element>
maLocation=archpoint.ijs.
ClassifyKMea
Occurs="1" m
Occurs="0" m
Occurs="1" m
Occurs="1" m
Occurs="1" m
e="DataSour
="s:string">
alue="Googl
alue="Yahoo
alue="Live" /
alue="About
alue="About
alue="About
alue="EBay"
alue="NYTim
alue="Mladi
alue="Watso
alue="Enron
alue="CCA" /
alue="Photo
alue="String
alue="File" /
alue="Googl
alue="Swoog
ClassifyKMea
Occurs="0" m
e>
space="http:
>
pe>
="http://locasi/Classifier/
ans">
maxOccurs="
maxOccurs="
maxOccurs="
maxOccurs="
maxOccurs="
ce">
e" />
o" />
/>
t" />
tViaGoogle"
tViaYahoo" /
/>
mes" />
na" />
on" />
" />
/>
o12" />
" />
/>
eOnto" />
gle" />
ansResponse
maxOccurs="
//searchpoin
alhost:60107/BowPart.xsd
1" name="D
1" name="Q
1" name="N
1" name="N
1" name="N
/>
/>
e">
1" name="Cl
nt.ijs.si/Class
7/Classifier/Wd" />
S" type="tns
Query" type="
umHits" typ
umMinHits"
umCategorie
lassifyKMean
sifier/BowPa
NeOn Integra
WS_Classify.a
s:DataSource
"s:string" />
e="s:int" />
type="s:int"
es" type="s:i
nsResult">
art.xsd" />
ated Project EU
asmx?schem
e" />
" />
int" />
U-IST-027595
ma=BowPart"
5
"
D3.2.4 Context-sensitive Search of Ontologies Page 27 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
</s:sequence>
</s:complexType>
</s:element>
<s:element name="ClassifyKMeansIn">
<s:complexType>
<s:sequence>
<s:element minOccurs="1" maxOccurs="1" name="DS" type="tns:DataSource" />
<s:element minOccurs="0" maxOccurs="1" name="Query" type="s:string" />
<s:element minOccurs="1" maxOccurs="1" name="NumHits" type="s:int" />
<s:element minOccurs="1" maxOccurs="1" name="NumMinHits" type="s:int" />
<s:element minOccurs="1" maxOccurs="1" name="NumCategories" type="s:int" />
</s:sequence>
</s:complexType>
</s:element>
<s:element name="ClassifyKMeansInResponse">
<s:complexType>
<s:sequence>
<s:element minOccurs="1" maxOccurs="1" name="ClassifyKMeansInResult" type="tns:BowPartStruc" />
</s:sequence>
</s:complexType>
</s:element>
<s:complexType name="BowPartStruc">
<s:sequence>
<s:element minOccurs="0" maxOccurs="1" name="Clusters" type="tns:ArrayOfCluster" />
<s:element minOccurs="0" maxOccurs="1" name="Links" type="tns:ArrayOfLink" />
<s:element minOccurs="0" maxOccurs="1" name="Documents" type="tns:ArrayOfDocument" />
</s:sequence>
</s:complexType>
<s:complexType name="ArrayOfCluster">
<s:sequence>
<s:element minOccurs="0" maxOccurs="unbounded" name="Cluster" type="tns:Cluster" />
</s:sequence>
</s:complexType>
<s:complexType name="Cluster">
<s:sequence>
<s:element minOccurs="1" maxOccurs="1" name="Clusters_Id" type="s:int" />
<s:element minOccurs="0" maxOccurs="1" name="Title" type="s:string" />
<s:element minOccurs="0" maxOccurs="1" name="Color" type="s:string" />
<s:element minOccurs="1" maxOccurs="1" name="Quality" type="s:double" />
<s:element minOccurs="1" maxOccurs="1" name="X" type="s:double" />
<s:element minOccurs="1" maxOccurs="1" name="Y" type="s:double" />
</s:sequence>
</s:complexType>
Page 28 of 4
<s:comp
<s:sequ
<s:ele
</s:seq
</s:com
<s:comp
<s:sequ
<s:ele
<s:ele
<s:ele
<s:ele
</s:seq
</s:com
<s:comp
<s:sequ
<s:ele
</s:seq
</s:com
<s:comp
<s:sequ
<s:ele
<s:ele
<s:ele
</s:seq
</s:com
<s:eleme
<s:com
<s:seq
<s:el
<s:el
<s:el
<s:el
<s:el
</s:se
</s:com
</s:elem
<s:eleme
<s:com
<s:seq
<s:el
<s:c
<s:
<s
0
plexType nam
uence>
ment minOc
quence>
plexType>
plexType nam
uence>
ment minOc
ment minOc
ment minOc
ment minOc
quence>
plexType>
plexType nam
uence>
ment minOc
quence>
plexType>
plexType nam
uence>
ment minOc
ment minOc
ment minOc
quence>
plexType>
ent name="C
mplexType>
quence>
ement minO
ement minO
ement minO
ement minO
ement minO
equence>
mplexType>
ment>
ent name="C
mplexType>
quence>
ement minO
complexType
:sequence>
s:any names
me="ArrayOf
ccurs="0" ma
me="Link">
ccurs="1" ma
ccurs="1" ma
ccurs="1" ma
ccurs="1" ma
me="ArrayOf
ccurs="0" ma
me="Docume
ccurs="1" ma
ccurs="1" ma
ccurs="0" ma
ClassifyDMoz
Occurs="1" m
Occurs="0" m
Occurs="1" m
Occurs="1" m
Occurs="1" m
ClassifyDMoz
Occurs="0" m
e>
space="http:
fLink">
axOccurs="u
axOccurs="1
axOccurs="1
axOccurs="1
axOccurs="1
fDocument">
axOccurs="u
ent">
axOccurs="1
axOccurs="1
axOccurs="1
z">
maxOccurs="
maxOccurs="
maxOccurs="
maxOccurs="
maxOccurs="
zResponse">
maxOccurs="
//searchpoin
nbounded" n
" name="id"
" name="id1
" name="id2
" name="Qu
>
nbounded" n
" name="id"
" name="rel
" name="dcS
1" name="D
1" name="Q
1" name="N
1" name="N
1" name="N
>
1" name="Cl
nt.ijs.si/Class
name="Link"
" type="s:int"
1" type="s:in
2" type="s:in
uality" type="
name="Docu
" type="s:dou
evance" type
Sim" type="s
S" type="tns
Query" type="
umHits" typ
umMinHits"
umCategorie
lassifyDMozR
sifier/BowPa
NeOn Integra
" type="tns:L
" />
t" />
t" />
"s:double" />
ument" type
uble" />
e="s:double"
s:string" />
s:DataSource
"s:string" />
e="s:int" />
type="s:int"
es" type="s:i
Result">
art.xsd" />
ated Project EU
Link" />
>
="tns:Docum
" />
e" />
" />
int" />
U-IST-027595
ment" />
5
D3.2.4 Context-sensitive Search of Ontologies Page 29 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
</s:sequence>
</s:complexType>
</s:element>
</s:sequence>
</s:complexType>
</s:element>
<s:element name="ClassifyDMozContext">
<s:complexType>
<s:sequence>
<s:element minOccurs="1" maxOccurs="1" name="DS" type="tns:DataSource" />
<s:element minOccurs="0" maxOccurs="1" name="Query" type="s:string" />
<s:element minOccurs="1" maxOccurs="1" name="NumHits" type="s:int" />
<s:element minOccurs="1" maxOccurs="1" name="NumMinHits" type="s:int" />
<s:element minOccurs="1" maxOccurs="1" name="NumCategories" type="s:int" />
</s:sequence>
</s:complexType>
</s:element>
<s:element name="ClassifyDMozContextResponse">
<s:complexType>
<s:sequence>
<s:element minOccurs="0" maxOccurs="1" name="ClassifyDMozContextResult">
<s:complexType>
<s:sequence>
<s:any namespace="http://searchpoint.ijs.si/Classifier/BowPart.xsd" />
</s:sequence>
</s:complexType>
</s:element>
</s:sequence>
</s:complexType>
</s:element>
<s:element name="ClassifyGY">
<s:complexType>
<s:sequence>
<s:element minOccurs="1" maxOccurs="1" name="DS" type="tns:DataSource" />
<s:element minOccurs="0" maxOccurs="1" name="Query" type="s:string" />
<s:element minOccurs="1" maxOccurs="1" name="NumHits" type="s:int" />
<s:element minOccurs="1" maxOccurs="1" name="NumCategories" type="s:int" />
</s:sequence>
</s:complexType>
</s:element>
<s:element name="ClassifyGYResponse">
<s:complexType>
<s:sequence>
Page 30 of 4
<s:el
<s:c
<s:
<s
</s
</s:
</s:e
</s:se
</s:com
</s:elem
<s:eleme
<s:com
<s:seq
<s:el
<s:el
<s:el
<s:el
<s:el
</s:se
</s:com
</s:elem
<s:eleme
<s:com
<s:seq
<s:el
<s:c
<s:
<s
</s
</s:
</s:e
</s:se
</s:com
</s:elem
<s:eleme
<s:com
<s:seq
<s:an
</s:se
</s:com
</s:elem
<s:eleme
</s:schem
0
ement minO
complexType
:sequence>
s:any names
s:sequence>
:complexTyp
element>
equence>
mplexType>
ment>
ent name="C
mplexType>
quence>
ement minO
ement minO
ement minO
ement minO
ement minO
equence>
mplexType>
ment>
ent name="C
mplexType>
quence>
ement minO
complexType
:sequence>
s:any names
s:sequence>
:complexTyp
element>
equence>
mplexType>
ment>
ent name="B
mplexType>
quence>
ny namespac
equence>
mplexType>
ment>
ent name="B
ma>
Occurs="0" m
e>
space="http:
>
pe>
ClassifyEuroV
Occurs="1" m
Occurs="0" m
Occurs="1" m
Occurs="1" m
Occurs="1" m
ClassifyEuroV
Occurs="0" m
e>
space="http:
>
pe>
BowPart" nil
ce="http://se
BowPartStru
maxOccurs="
//searchpoin
Voc">
maxOccurs="
maxOccurs="
maxOccurs="
maxOccurs="
maxOccurs="
VocResponse
maxOccurs="
//searchpoin
lable="true"
earchpoint.ij
c" type="tns
1" name="Cl
nt.ijs.si/Class
1" name="D
1" name="Q
1" name="N
1" name="N
1" name="N
e">
1" name="Cl
nt.ijs.si/Class
">
js.si/Classifie
s:BowPartStr
lassifyGYRes
sifier/BowPa
S" type="tns
Query" type="
umHits" typ
umMinHits"
umCategorie
lassifyEuroV
sifier/BowPa
er/BowPart.x
ruc" />
NeOn Integra
sult">
art.xsd" />
s:DataSource
"s:string" />
e="s:int" />
type="s:int"
es" type="s:i
ocResult">
art.xsd" />
xsd" />
ated Project EU
e" />
" />
int" />
U-IST-0275955
D3.2.4 Context-sensitive Search of Ontologies Page 31 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
</wsdl:types>
<wsdl:message name="ClassifyKMeansSoapIn">
<wsdl:part name="parameters" element="tns:ClassifyKMeans" />
</wsdl:message>
<wsdl:message name="ClassifyKMeansSoapOut">
<wsdl:part name="parameters" element="tns:ClassifyKMeansResponse" />
</wsdl:message>
<wsdl:message name="ClassifyKMeansInSoapIn">
<wsdl:part name="parameters" element="tns:ClassifyKMeansIn" />
</wsdl:message>
<wsdl:message name="ClassifyKMeansInSoapOut">
<wsdl:part name="parameters" element="tns:ClassifyKMeansInResponse" />
</wsdl:message>
<wsdl:message name="ClassifyDMozSoapIn">
<wsdl:part name="parameters" element="tns:ClassifyDMoz" />
</wsdl:message>
<wsdl:message name="ClassifyDMozSoapOut">
<wsdl:part name="parameters" element="tns:ClassifyDMozResponse" />
</wsdl:message>
<wsdl:message name="ClassifyDMozContextSoapIn">
<wsdl:part name="parameters" element="tns:ClassifyDMozContext" />
</wsdl:message>
<wsdl:message name="ClassifyDMozContextSoapOut">
<wsdl:part name="parameters" element="tns:ClassifyDMozContextResponse" />
</wsdl:message>
<wsdl:message name="ClassifyGYSoapIn">
<wsdl:part name="parameters" element="tns:ClassifyGY" />
</wsdl:message>
<wsdl:message name="ClassifyGYSoapOut">
<wsdl:part name="parameters" element="tns:ClassifyGYResponse" />
</wsdl:message>
<wsdl:message name="ClassifyEuroVocSoapIn">
<wsdl:part name="parameters" element="tns:ClassifyEuroVoc" />
</wsdl:message>
<wsdl:message name="ClassifyEuroVocSoapOut">
<wsdl:part name="parameters" element="tns:ClassifyEuroVocResponse" />
</wsdl:message>
<wsdl:message name="ClassifyKMeansHttpGetIn">
<wsdl:part name="DS" type="s:string" />
<wsdl:part name="Query" type="s:string" />
<wsdl:part name="NumHits" type="s:string" />
<wsdl:part name="NumMinHits" type="s:string" />
<wsdl:part name="NumCategories" type="s:string" />
Page 32 of 4
</wsdl:me
<wsdl:mes
<wsdl:par
</wsdl:me
<wsdl:mes
<wsdl:par
<wsdl:par
<wsdl:par
<wsdl:par
<wsdl:par
</wsdl:me
<wsdl:mes
<wsdl:par
</wsdl:me
<wsdl:mes
<wsdl:par
<wsdl:par
<wsdl:par
<wsdl:par
<wsdl:par
</wsdl:me
<wsdl:mes
<wsdl:par
</wsdl:me
<wsdl:mes
<wsdl:par
<wsdl:par
<wsdl:par
<wsdl:par
<wsdl:par
</wsdl:me
<wsdl:mes
<wsdl:par
</wsdl:me
<wsdl:mes
<wsdl:par
<wsdl:par
<wsdl:par
<wsdl:par
</wsdl:me
<wsdl:mes
<wsdl:par
</wsdl:me
0
essage>
ssage name=
rt name="Bo
essage>
ssage name=
rt name="DS
rt name="Qu
rt name="Nu
rt name="Nu
rt name="Nu
essage>
ssage name=
rt name="Bo
essage>
ssage name=
rt name="DS
rt name="Qu
rt name="Nu
rt name="Nu
rt name="Nu
essage>
ssage name=
rt name="Bo
essage>
ssage name=
rt name="DS
rt name="Qu
rt name="Nu
rt name="Nu
rt name="Nu
essage>
ssage name=
rt name="Bo
essage>
ssage name=
rt name="DS
rt name="Qu
rt name="Nu
rt name="Nu
essage>
ssage name=
rt name="Bo
essage>
="ClassifyKM
ody" element
="ClassifyKM
S" type="s:st
uery" type="
umHits" type
umMinHits"
umCategorie
="ClassifyKM
ody" element
="ClassifyDM
S" type="s:st
uery" type="
umHits" type
umMinHits"
umCategorie
="ClassifyDM
ody" element
="ClassifyDM
S" type="s:st
uery" type="
umHits" type
umMinHits"
umCategorie
="ClassifyDM
ody" element
="ClassifyGYH
S" type="s:st
uery" type="
umHits" type
umCategorie
="ClassifyGYH
ody" element
eansHttpGet
t="tns:BowP
eansInHttpG
tring" />
"s:string" />
e="s:string" /
type="s:strin
es" type="s:s
eansInHttpG
t="tns:BowP
MozHttpGetIn
tring" />
"s:string" />
e="s:string" /
type="s:strin
es" type="s:s
MozHttpGetO
t="tns:BowP
MozContextHt
tring" />
"s:string" />
e="s:string" /
type="s:strin
es" type="s:s
MozContextHt
t="tns:BowP
HttpGetIn">
tring" />
"s:string" />
e="s:string" /
es" type="s:s
HttpGetOut"
t="tns:BowP
tOut">
Part" />
GetIn">
/>
ng" />
tring" />
GetOut">
PartStruc" />
n">
/>
ng" />
tring" />
Out">
Part" />
ttpGetIn">
/>
ng" />
tring" />
ttpGetOut">
Part" />
/>
tring" />
">
Part" />
>
NeOn Integraated Project EUU-IST-0275955
D3.2.4 Context-sensitive Search of Ontologies Page 33 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
<wsdl:message name="ClassifyEuroVocHttpGetIn">
<wsdl:part name="DS" type="s:string" />
<wsdl:part name="Query" type="s:string" />
<wsdl:part name="NumHits" type="s:string" />
<wsdl:part name="NumMinHits" type="s:string" />
<wsdl:part name="NumCategories" type="s:string" />
</wsdl:message>
<wsdl:message name="ClassifyEuroVocHttpGetOut">
<wsdl:part name="Body" element="tns:BowPart" />
</wsdl:message>
<wsdl:portType name="WS_ClassifySoap">
<wsdl:operation name="ClassifyKMeans">
<wsdl:input message="tns:ClassifyKMeansSoapIn" />
<wsdl:output message="tns:ClassifyKMeansSoapOut" />
</wsdl:operation>
<wsdl:operation name="ClassifyKMeansIn">
<wsdl:input message="tns:ClassifyKMeansInSoapIn" />
<wsdl:output message="tns:ClassifyKMeansInSoapOut" />
</wsdl:operation>
<wsdl:operation name="ClassifyDMoz">
<wsdl:input message="tns:ClassifyDMozSoapIn" />
<wsdl:output message="tns:ClassifyDMozSoapOut" />
</wsdl:operation>
<wsdl:operation name="ClassifyDMozContext">
<wsdl:input message="tns:ClassifyDMozContextSoapIn" />
<wsdl:output message="tns:ClassifyDMozContextSoapOut" />
</wsdl:operation>
<wsdl:operation name="ClassifyGY">
<wsdl:input message="tns:ClassifyGYSoapIn" />
<wsdl:output message="tns:ClassifyGYSoapOut" />
</wsdl:operation>
<wsdl:operation name="ClassifyEuroVoc">
<wsdl:input message="tns:ClassifyEuroVocSoapIn" />
<wsdl:output message="tns:ClassifyEuroVocSoapOut" />
</wsdl:operation>
</wsdl:portType>
<wsdl:portType name="WS_ClassifyHttpGet">
<wsdl:operation name="ClassifyKMeans">
<wsdl:input message="tns:ClassifyKMeansHttpGetIn" />
<wsdl:output message="tns:ClassifyKMeansHttpGetOut" />
</wsdl:operation>
<wsdl:operation name="ClassifyKMeansIn">
<wsdl:input message="tns:ClassifyKMeansInHttpGetIn" />
Page 34 of 4
<wsdl:ou
</wsdl:op
<wsdl:op
<wsdl:in
<wsdl:ou
</wsdl:op
<wsdl:op
<wsdl:in
<wsdl:ou
</wsdl:op
<wsdl:op
<wsdl:in
<wsdl:ou
</wsdl:op
<wsdl:op
<wsdl:in
<wsdl:ou
</wsdl:op
</wsdl:por
<wsdl:bind
<soap:bin
<wsdl:op
<soap:o
<wsdl:in
<soap:b
</wsdl:i
<wsdl:ou
<soap:b
</wsdl:o
</wsdl:op
<wsdl:op
<soap:o/>
<wsdl:in
<soap:b
</wsdl:i
<wsdl:ou
<soap:b
</wsdl:o
</wsdl:op
<wsdl:op
<soap:o
<wsdl:in
<soap:b
0
utput messa
peration>
eration nam
nput message
utput messa
peration>
eration nam
nput message
utput messa
peration>
eration nam
nput message
utput messa
peration>
eration nam
nput message
utput messa
peration>
rtType>
ding name="
nding transpo
eration nam
peration soa
nput>
body use="li
nput>
utput>
body use="li
output>
peration>
eration nam
peration so
nput>
body use="li
nput>
utput>
body use="li
output>
peration>
eration nam
peration soa
nput>
body use="li
ge="tns:Clas
e="ClassifyD
e="tns:Classi
ge="tns:Clas
e="ClassifyD
e="tns:Classi
ge="tns:Clas
e="ClassifyG
e="tns:Classi
ge="tns:Clas
e="ClassifyE
e="tns:Classi
ge="tns:Clas
WS_Classify
ort="http://s
e="ClassifyK
apAction="ht
teral" />
teral" />
e="ClassifyK
oapAction="h
teral" />
teral" />
e="ClassifyD
apAction="ht
teral" />
ssifyKMeans
DMoz">
ifyDMozHttp
ssifyDMozHt
DMozContext
ifyDMozCon
ssifyDMozCo
GY">
ifyGYHttpGe
ssifyGYHttpG
EuroVoc">
ifyEuroVocH
ssifyEuroVoc
Soap" type=
schemas.xm
KMeans">
ttp://searchp
KMeansIn">
http://search
DMoz">
ttp://searchp
InHttpGetOu
pGetIn" />
tpGetOut" /
t">
textHttpGet
ontextHttpGe
etIn" />
GetOut" />
ttpGetIn" />
cHttpGetOut
"tns:WS_Cla
lsoap.org/so
point.ijs.si/C
hpoint.ijs.si/
point.ijs.si/C
ut" />
>
In" />
etOut" />
>
" />
assifySoap">
oap/http" />
lassifier/Clas
/Classifier/Cla
lassifier/Clas
NeOn Integra
ssifyKMeans
assifyKMean
ssifyDMoz" s
ated Project EU
" style="doc
nsIn" style=
style="docum
U-IST-027595
ument" />
"document"
ment" />
5
"
D3.2.4 Context-sensitive Search of Ontologies Page 35 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
</wsdl:input>
<wsdl:output>
<soap:body use="literal" />
</wsdl:output>
</wsdl:operation>
<wsdl:operation name="ClassifyDMozContext">
<soap:operation soapAction="http://searchpoint.ijs.si/Classifier/ClassifyDMozContext" style="document" />
<wsdl:input>
<soap:body use="literal" />
</wsdl:input>
<wsdl:output>
<soap:body use="literal" />
</wsdl:output>
</wsdl:operation>
<wsdl:operation name="ClassifyGY">
<soap:operation soapAction="http://searchpoint.ijs.si/Classifier/ClassifyGY" style="document" />
<wsdl:input>
<soap:body use="literal" />
</wsdl:input>
<wsdl:output>
<soap:body use="literal" />
</wsdl:output>
</wsdl:operation>
<wsdl:operation name="ClassifyEuroVoc">
<soap:operation soapAction="http://searchpoint.ijs.si/Classifier/ClassifyEuroVoc" style="document" />
<wsdl:input>
<soap:body use="literal" />
</wsdl:input>
<wsdl:output>
<soap:body use="literal" />
</wsdl:output>
</wsdl:operation>
</wsdl:binding>
<wsdl:binding name="WS_ClassifySoap12" type="tns:WS_ClassifySoap">
<soap12:binding transport="http://schemas.xmlsoap.org/soap/http" />
<wsdl:operation name="ClassifyKMeans">
<soap12:operation soapAction="http://searchpoint.ijs.si/Classifier/ClassifyKMeans" style="document" />
<wsdl:input>
<soap12:body use="literal" />
</wsdl:input>
<wsdl:output>
<soap12:body use="literal" />
Page 36 of 4
</wsdl:o
</wsdl:op
<wsdl:op
<soap12/>
<wsdl:in
<soap1
</wsdl:i
<wsdl:ou
<soap1
</wsdl:o
</wsdl:op
<wsdl:op
<soap12
<wsdl:in
<soap1
</wsdl:i
<wsdl:ou
<soap1
</wsdl:o
</wsdl:op
<wsdl:op
<soap12style="docu
<wsdl:in
<soap1
</wsdl:i
<wsdl:ou
<soap1
</wsdl:o
</wsdl:op
<wsdl:op
<soap12
<wsdl:in
<soap1
</wsdl:i
<wsdl:ou
<soap1
</wsdl:o
</wsdl:op
<wsdl:op
<soap12/>
<wsdl:in
0
output>
peration>
eration nam
2:operation s
nput>
12:body use=
nput>
utput>
12:body use=
output>
peration>
eration nam
2:operation s
nput>
12:body use=
nput>
utput>
12:body use=
output>
peration>
eration nam
2:operation ument" />
nput>
12:body use=
nput>
utput>
12:body use=
output>
peration>
eration nam
2:operation s
nput>
12:body use=
nput>
utput>
12:body use=
output>
peration>
eration nam
2:operation
nput>
e="ClassifyK
soapAction=
="literal" />
="literal" />
e="ClassifyD
soapAction="
="literal" />
="literal" />
e="ClassifyD
="literal" />
="literal" />
e="ClassifyG
soapAction="
="literal" />
="literal" />
e="ClassifyE
soapAction=
KMeansIn">
"http://sear
DMoz">
"http://searc
DMozContext
soapActio
GY">
"http://searc
EuroVoc">
="http://sea
chpoint.ijs.si
chpoint.ijs.si
t">
on="http://se
chpoint.ijs.si
rchpoint.ijs.s
i/Classifier/C
i/Classifier/C
earchpoint.ij
i/Classifier/C
si/Classifier/
NeOn Integra
ClassifyKMea
ClassifyDMoz
js.si/Classifie
ClassifyGY" st
ClassifyEuro
ated Project EU
ansIn" style=
z" style="doc
er/ClassifyDM
tyle="docum
Voc" style=
U-IST-027595
"document"
cument" />
MozContext"
ment" />
"document"
5
"
"
"
D3.2.4 Context-sensitive Search of Ontologies Page 37 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
<soap12:body use="literal" />
</wsdl:input>
<wsdl:output>
<soap12:body use="literal" />
</wsdl:output>
</wsdl:operation>
</wsdl:binding>
<wsdl:binding name="WS_ClassifyHttpGet" type="tns:WS_ClassifyHttpGet">
<http:binding verb="GET" />
<wsdl:operation name="ClassifyKMeans">
<http:operation location="/ClassifyKMeans" />
<wsdl:input>
<http:urlEncoded />
</wsdl:input>
<wsdl:output>
<mime:mimeXml part="Body" />
</wsdl:output>
</wsdl:operation>
<wsdl:operation name="ClassifyKMeansIn">
<http:operation location="/ClassifyKMeansIn" />
<wsdl:input>
<http:urlEncoded />
</wsdl:input>
<wsdl:output>
<mime:mimeXml part="Body" />
</wsdl:output>
</wsdl:operation>
<wsdl:operation name="ClassifyDMoz">
<http:operation location="/ClassifyDMoz" />
<wsdl:input>
<http:urlEncoded />
</wsdl:input>
<wsdl:output>
<mime:mimeXml part="Body" />
</wsdl:output>
</wsdl:operation>
<wsdl:operation name="ClassifyDMozContext">
<http:operation location="/ClassifyDMozContext" />
<wsdl:input>
<http:urlEncoded />
</wsdl:input>
<wsdl:output>
<mime:mimeXml part="Body" />
Page 38 of 4
</wsdl:o
</wsdl:op
<wsdl:op
<http:op
<wsdl:in
<http:u
</wsdl:i
<wsdl:ou
<mime
</wsdl:o
</wsdl:op
<wsdl:op
<http:op
<wsdl:in
<http:u
</wsdl:i
<wsdl:ou
<mime
</wsdl:o
</wsdl:op
</wsdl:bin
<wsdl:serv
<wsdl:po
<soap:ad
</wsdl:po
<wsdl:po
<soap12
</wsdl:po
<wsdl:po
<http:ad
</wsdl:po
</wsdl:ser
</wsdl:defi
0
output>
peration>
eration nam
peration loca
nput>
urlEncoded /
nput>
utput>
:mimeXml p
output>
peration>
eration nam
peration loca
nput>
urlEncoded /
nput>
utput>
:mimeXml p
output>
peration>
nding>
vice name="W
rt name="W
ddress locati
ort>
rt name="W
2:address loc
ort>
rt name="W
ddress locatio
ort>
rvice>
nitions>
e="ClassifyG
ation="/Class
/>
art="Body" /
e="ClassifyE
ation="/Class
/>
art="Body" /
WS_Classify"
WS_ClassifySo
ion="http://
WS_ClassifySo
cation="http
WS_ClassifyHt
on="http://l
GY">
sifyGY" />
/>
EuroVoc">
sifyEuroVoc"
/>
">
oap" binding=
localhost:60
oap12" bindi
://localhost:
ttpGet" bind
ocalhost:601
" />
="tns:WS_Cl
0107/Classifie
ng="tns:WS_
60107/Class
ing="tns:WS
107/Classifie
assifySoap">
er/WS_Class
_ClassifySoap
sifier/WS_Cla
S_ClassifyHtt
er/WS_Classi
NeOn Integra
>
sify.asmx" />
p12">
assify.asmx"
pGet">
ify.asmx" />
ated Project EU
/>
U-IST-0275955
D3.2.4 Context-sensitive Search of Ontologies Page 39 of 40
2006–2009 © Copyright lies with the respective authors and their institutions.
References
[d'Aquin 2008] d'Aquin, M. Building Semantic Web Based Applications with Watson. 17th International World Wide Web Conference (WWW2008) Developers Track.
[Fortuna et all, 2005] B. Fortuna, M. Grobelnik, D. Mladenic: Semi-automatic Construction of Topic Ontology. Semantics, Web and Mining, Joint International Workshop, EWMF 2005 and KDO 2005, Porto, Portugal, October 3–7, 2005.
[Fruchterman and Reingold, 1991] T. M. J. Fruchterman and E. M. Reingold. Graph drawing by force directed placement. Softw. Pract. Exper., 1991.
[Grobelnik and Mladenić, 2006] Grobelnik, M., Mladenić, D., (2006). Automated Knowledge Discovery in Advanced Knowledge Management, journal of Knowledge Management.
[Grobelnik et al., 2008] Marko Grobelnik, Janez Brank, Blaž Fortuna, Igor Mozetič. Contextualizing Ontologies with OntoLight: A Pragmatic Approach, Informatica. 2008.
[Kanungo et all, 2002] T. Kanungo, D. M. Mount, N. Netanyahu, C. Piatko, R. Silverman, and A. Y. Wu (2002). An efficient k-means clustering algorithm: Analysis and implementation, IEEE Trans. Pattern Analysis and Machine Intelligence, 24 (2002), 881-892.
[Koller and Sahami, 1997] D., Sahami, M., (1997). Hierarchically classifying documents using very few words, Proceedings of the 14th International Conference on Machine Learning ICML-97, pp. 170-178, Morgan Kaufmann, San Francisco, CA.
[McCallum et all, 1998] McCallum A., Rosenfeld R., Mitchell T., Ng A., (1998). Improving Text Classification by Shrinkage in a Hierarchy of Classes, Proceedings of the 15th International Conference on Machine Learning ICML-98, Morgan Kaufmann, San Francisco, CA.
[Mitchell, 1997] Mitchell, T.M. (1997). Machine Learning. The McGraw-Hill Companies, Inc.
[Mladenić, 1998] Mladenić, D. (1998). Turning Yahoo into an Automatic Web-Page Classifier. Proc. 13th European Conference on Artificial Intelligence (ECAI'98, John Wiley & Sons), 473–474.
[Mladenić and Grobelnik, 2003] Mladenić, D., Grobelnik, M. (2003). Feature selection on hierarchy of web documents. Journal of Decision support systems, 35, 45-87.
[Pajntar, 2006] Pajntar B. (2006), Overview of algorithms for graph drawing, In proc. of Slovenian KDD Conference 2006. Oct. 2006
Page 40 of 4
[Pajntar anParadigm Developers
[Sabou et aBackgrounOntology M
[SteingbaccomparisoGrobelnik,
0
nd Grobelniof Web
s Track.
all, 2006] nd KnowledMatching (O
ch et all, 200n of documM., Mladen
k, 2008] Search. 17
M. dge for OntoOM-2006), c
00] ment clustenić, D. and M
Boštjan7th Interna
Sabou, M. ology Mapp
collocated w
Steinbaering technMilic-Fraylin
n Pajntar, ational Wo
d’Aquin, aping, In Prowith ISWC-0
ach, M., Kniques. Prong, N.), Bos
Marko Groorld Wide
and E. Motoceedings o06.
Karypis, Goc. KDD Wston, MA, U
NeOn Integra
obelnik. SeWeb Con
ta: Using tof the Intern
G. and KuWorkshop o
SA, 109–11
ated Project EU
earchPoint ference (W
the Semantrnational Wo
umar, V. on Text Mi10.
U-IST-027595
– a NewWWW2008)
tic Web asWorkshop on
(2000). Aning. (eds.
5
w )
s n
A .