+ All Categories
Transcript
Page 1: SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

Industrial Ontologies Group

University of JyväskyläUniversity of Jyväskylä

SIMILARITY/CLOSENESS-BASED SIMILARITY/CLOSENESS-BASED RESOURCE BROWSERRESOURCE BROWSER

SIMILARITY/CLOSENESS-BASED SIMILARITY/CLOSENESS-BASED RESOURCE BROWSERRESOURCE BROWSER

Oleksiy KhriyenkoOleksiy Khriyenko

July 13 – 15, 2009, Cambridge, United KingdomJuly 13 – 15, 2009, Cambridge, United Kingdom

University of Jyväskylä, FinlandUniversity of Jyväskylä, Finland

99th th IASTED International Conference onIASTED International Conference on Visualization, Imaging, and Image Processing Visualization, Imaging, and Image Processing ~VIIP 2009~~VIIP 2009~

99th th IASTED International Conference onIASTED International Conference on Visualization, Imaging, and Image Processing Visualization, Imaging, and Image Processing ~VIIP 2009~~VIIP 2009~

Vagan TerziyanVagan TerziyanIOG, Agora Center, MIT IOG, Agora Center, MIT DepartmentDepartment

presenterpresenter

Page 2: SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

ContentContentContentContent

Web evolution trendsWeb evolution trends Heterogeneity of resourcesHeterogeneity of resources Intelligent resource visualizationIntelligent resource visualization Resource closeness/similarity browserResource closeness/similarity browser Distance measurement functionDistance measurement function Visualization componentVisualization component

Future opportunitiesFuture opportunities

Browser enhancementBrowser enhancement PrototypePrototype

AcknowledgementsAcknowledgements

Page 3: SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

Web evolution trendsWeb evolution trendsWeb evolution trendsWeb evolution trends

Human becomes a very dynamic and proactive player in a large highly Human becomes a very dynamic and proactive player in a large highly heterogeneous and distributed environment with a huge amount of heterogeneous and distributed environment with a huge amount of different kind of data, services, devices, etc.different kind of data, services, devices, etc.

it is quite necessary to provide a technology it is quite necessary to provide a technology and tools for easy and handy human and tools for easy and handy human information access and manipulation.information access and manipulation.

Context-awarenessContext-awareness and intelligence of user interface brings a new feature that and intelligence of user interface brings a new feature that gives a possibility for user to get not just raw data, but required information gives a possibility for user to get not just raw data, but required information based on a specified context. based on a specified context.

Resource closeness/similarity searchResource closeness/similarity search is one of the most popular features that is one of the most popular features that users need during resource/information retrieving process. The similarity search users need during resource/information retrieving process. The similarity search has become a fundamental computational task in many applications: e-has become a fundamental computational task in many applications: e-commerce, data mining and knowledge discovery, case-based reasoning, commerce, data mining and knowledge discovery, case-based reasoning, knowledge management, text, image and information retrieval, etc. knowledge management, text, image and information retrieval, etc.

Thus, Thus, visualizationvisualization of the resources of the resources in a contextin a context of their similarity/closeness of their similarity/closeness becomes important functionality of the GUI and browsers. becomes important functionality of the GUI and browsers.

Page 4: SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

Heterogeneity of resourcesHeterogeneity of resourcesHeterogeneity of resourcesHeterogeneity of resources

Current challenge is a distributed nature of information and therefore high heterogeneity of Current challenge is a distributed nature of information and therefore high heterogeneity of entities to be comparedentities to be compared

information about these entities (resources) is coming from different sources and presented according information about these entities (resources) is coming from different sources and presented according to different schemas;to different schemas;

it is quite challenging to automatically compare even obviously similar resources but represented by it is quite challenging to automatically compare even obviously similar resources but represented by different properties;different properties;

quite a lot of different definitions of closeness and appropriate mechanisms providing various functions quite a lot of different definitions of closeness and appropriate mechanisms providing various functions to calculate the similarity as well attempts to smartly combine various similarity functions;to calculate the similarity as well attempts to smartly combine various similarity functions;

the heterogeneity of resources attributes itself (numerical, nominal, logical, interval, textual, etc.) is a the heterogeneity of resources attributes itself (numerical, nominal, logical, interval, textual, etc.) is a challenging problem and various methods to prepare data for computing similarity as well as computing challenging problem and various methods to prepare data for computing similarity as well as computing functions are exist and it is always a challenge to select the right one for a particular task.functions are exist and it is always a challenge to select the right one for a particular task.

Another important issue is how to represent results of a similarity based search to a Another important issue is how to represent results of a similarity based search to a humanhuman

what kind of visualization technique would be convenient for a user depending on his taskswhat kind of visualization technique would be convenient for a user depending on his tasks

We expect to experience a We expect to experience a future Webfuture Web which will be which will be media richmedia rich, , highly highly interactiveinteractive and and user orienteduser oriented. The value of this Web will lie not only in the . The value of this Web will lie not only in the massive amount of information that will be stored within it, but in the ability of massive amount of information that will be stored within it, but in the ability of Web technologies to organize, interpret and bring this information to the user.Web technologies to organize, interpret and bring this information to the user.

Page 5: SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

Intelligent resource visualizationIntelligent resource visualizationIntelligent resource visualizationIntelligent resource visualizationThe idea of intelligent resource visualization is to simplify the search and browsing The idea of intelligent resource visualization is to simplify the search and browsing processes via processes via multidimensional associative resource visualizationmultidimensional associative resource visualization means means visualization of a resource depending on a context, via association with various aspects visualization of a resource depending on a context, via association with various aspects of the resource (relations with other resources, domains, areas of interest, etc.).of the resource (relations with other resources, domains, areas of interest, etc.).

Such visualization Such visualization can give us a hint, can give us a hint, turn us to the right turn us to the right direction, show us direction, show us related objects related objects and provide links and provide links to them. In other to them. In other words, words, visualization will visualization will utilize utilize context-context-based filteringbased filtering and and enrichmentenrichment of the visualized of the visualized scene scene with the with the relevant linksrelevant links..

Page 6: SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

Intelligent resource visualizationIntelligent resource visualizationIntelligent resource visualizationIntelligent resource visualization4I(FOR EYE) technology4I(FOR EYE) technology willwill enableenable creation ofcreation of such smart human interfaces through such smart human interfaces through flexible collaboration of an flexible collaboration of an Intelligent GUI ShellIntelligent GUI Shell, various, various visualization modules visualization modules, which , which we refer to as we refer to as MetaProviderMetaProvider-services, and the -services, and the resources of interestresources of interest..

Semantically Semantically enhanced enhanced context-context-dependent dependent multidimensional multidimensional resource resource visualizationvisualization provides an provides an opportunity to opportunity to create intelligent create intelligent visual interface that visual interface that presents relevant presents relevant information in more information in more suitable and suitable and personalized for personalized for user form.user form.

Page 7: SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

Resource closeness/similarity browserResource closeness/similarity browserResource closeness/similarity browserResource closeness/similarity browser Resource closeness/similarity browsing Resource closeness/similarity browsing is a part of the is a part of the 4I Browser 4I Browser functionality. functionality. The main common Interface part – The main common Interface part – 4I GUI Shell4I GUI Shell, performs communication with resource , performs communication with resource repository and repository of visualization contests.repository and repository of visualization contests.

MetaProviderMetaProvider performs visualization function: depending on realization collects all performs visualization function: depending on realization collects all necessary data and visualizes resource/resources in context dependent way. Considering necessary data and visualizes resource/resources in context dependent way. Considering the MetaProvider that presents resources in a context of their closeness to the selected the MetaProvider that presents resources in a context of their closeness to the selected one, certain distance measuring calculation is performed before visualization phase.one, certain distance measuring calculation is performed before visualization phase.

4I GUI Shell4I GUI Shell uses xml-based resource storage. Such architecture requires converting uses xml-based resource storage. Such architecture requires converting the date from original format to xml representation.the date from original format to xml representation.

Comparison between the resources is Comparison between the resources is performed based on common properties. performed based on common properties. Current implementation supports just five Current implementation supports just five types of the parameters (properties):types of the parameters (properties):

Text field types:Text field types:

o word/sentenceword/sentence

o key words/sentenceskey words/sentences

o complex text fieldcomplex text field

Number fieldNumber field

Interval fieldInterval field

Page 8: SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

NumbersNumbers

IntervalsIntervals

StringsStrings

General distance between to objects:General distance between to objects: ,),(max

),(5.05.0),(

,,

2

YyXxi

ijiij

iii

i

ii dyxd

dyxdYXD

Closeness of two resources (objects) equals:Closeness of two resources (objects) equals: Dcloseness 1

,),(minmax vv

vvvvd kiki

where and are the maximum and minimum values from the sample. where and are the maximum and minimum values from the sample. maxv minv

D2r

1r

0D

,),,,(0D

rDbabad jjii

,2

)()(

2221121 ababrr

r

),,min(),max( jiji aabbD

.)min()max(0qq

pp abD

m

1l 2l

,

max

5.0

5.0

max

5.05.0),(

2

max

max

2

ll

l

ll

l

k

mM

m

mM

m

kttD

rr

l

jj

mki

where and are medians of corresponding samples, the where and are medians of corresponding samples, the coefficients and regulate significance of the distance between coefficients and regulate significance of the distance between the intervals and difference between the lengths of the intervals. the intervals and difference between the lengths of the intervals.

m lmk lk

k

i

iilj C

C

C

Cvvd

1

2

2

2

1

121

2,),(

,1

),(

),( ,121

21

n

vvd

vvd

n

jll

lj

j

,),(),(2211 ,,

22121

VvVvi

iiii

ii

vvdVVD

Text field type 3Text field type 3Text field type 2Text field type 2

,),(N

NYXD

where is a number of matched/equal instances where is a number of matched/equal instances

and is a general number of all instances in the and is a general number of all instances in the lists of two comparable objects.lists of two comparable objects.

N N

Distance measurement function Distance measurement function Distance measurement function Distance measurement function

Page 9: SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

Visualization componentVisualization componentVisualization componentVisualization component

Distance

Distance

We decided to put the resources on a spiral that We decided to put the resources on a spiral that lies on a surface of the cone. The minimal lies on a surface of the cone. The minimal distance between the resources has been taken distance between the resources has been taken as a step on an axis/height of the cone. Just that as a step on an axis/height of the cone. Just that parameter (distance on the axis/height) shows the parameter (distance on the axis/height) shows the closeness of the resources.closeness of the resources.

To avoid an overlap (in case of a To avoid an overlap (in case of a viewpoint from the top of the cone) of the viewpoint from the top of the cone) of the images that belong to resources located images that belong to resources located next to each other, we have calculated next to each other, we have calculated the location angle ( ) on each (step-the location angle ( ) on each (step-based) cone cut. Additionally, we based) cone cut. Additionally, we provided a possibility to rotate the cone provided a possibility to rotate the cone to find the best view point. to find the best view point.

Page 10: SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

Browser enhancementBrowser enhancementBrowser enhancementBrowser enhancementCurrent implementation of the 4I GUI Shell supportsCurrent implementation of the 4I GUI Shell supports

visual configuration of resource similarity visualization context.visual configuration of resource similarity visualization context.

User has a possibility to create new, delete and modify the similarity visualization User has a possibility to create new, delete and modify the similarity visualization contexts. Such visualization context implies user specification of the resource contexts. Such visualization context implies user specification of the resource properties significance and existence of additional contextual information for the properties significance and existence of additional contextual information for the resources properties (depending on their types). resources properties (depending on their types).

We consider the We consider the “absolute significance”“absolute significance” of the resource of the resource fields as percentages from the full influence of the fields. In this fields as percentages from the full influence of the fields. In this case the sum of the fields’ significances should be equal case the sum of the fields’ significances should be equal 100%. The same approach has been applied for the sub fields 100%. The same approach has been applied for the sub fields if there are any. For the “absolute significance” system if there are any. For the “absolute significance” system supports two modes:supports two modes:

fully user controlled mode;fully user controlled mode; mode with automatic recalculation of the mode with automatic recalculation of the significances.significances.

Sometimes user prefers to specify Sometimes user prefers to specify “relative significance”“relative significance” for for the field/property. In this case user estimates the significance the field/property. In this case user estimates the significance of each field/property by value from 0 to 100 separately. With of each field/property by value from 0 to 100 separately. With the “relative significance” the absolute values do not make the “relative significance” the absolute values do not make sense, only comparative differences of the values are taken sense, only comparative differences of the values are taken into account. into account.

Page 11: SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

PrototypePrototypePrototypePrototype

Page 12: SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

)(log),(logminlog

),(log)(log),(logmax),(

yfxfM

yxfyfxfyxNGD

Normalized Google Distance (NGD):Normalized Google Distance (NGD):

Measures of Semantic Relatedness (MSRs) - Measures of Semantic Relatedness (MSRs) - remote servicesremote services

1)1)

2)2)

Future opportunitiesFuture opportunitiesFuture opportunitiesFuture opportunities General adapter that enables to convert data from any format to the required one is not General adapter that enables to convert data from any format to the required one is not exists. Further, we consider elaboration of a general adaptation module that can be exists. Further, we consider elaboration of a general adaptation module that can be imbedded to the Shell and will transform data from different formats to the internal imbedded to the Shell and will transform data from different formats to the internal resource representation format.resource representation format.

Increasing of distance measuring methods amount and types of compared resource Increasing of distance measuring methods amount and types of compared resource description fields is considered as a future direction.description fields is considered as a future direction.

The same technique that we use for resource closeness visualization can be utilized for The same technique that we use for resource closeness visualization can be utilized for resource ranking. The only requirement for this is to describe “virtual/abstract” (or chose resource ranking. The only requirement for this is to describe “virtual/abstract” (or chose from existing) etalon resource and calculate the distances of all other resources to that from existing) etalon resource and calculate the distances of all other resources to that one. Such approach can be utilized for simple ranking methods. For complex methods, for one. Such approach can be utilized for simple ranking methods. For complex methods, for sure, we have to elaborate appropriate modifications. sure, we have to elaborate appropriate modifications.

Page 13: SIMILARITY/CLOSENESS-BASED RESOURCE BROWSER

AcknowledgementsAcknowledgements AcknowledgementsAcknowledgements

University of JyväskyläUniversity of Jyväskylä

www.cs.jyu.fi/ai/OntoGroupwww.cs.jyu.fi/ai/OntoGroup

www.cs.jyu.fi/ai/OntoGroup/UBIWARE_details.htmwww.cs.jyu.fi/ai/OntoGroup/UBIWARE_details.htm


Top Related