SIMILARITY/CLOSENESS-BASED RESOURCE BROWSERUniversity of JyvskylOleksiy Khriyenko
July 13 15, 2009, Cambridge, United Kingdom
University of Jyvskyl, Finland9th IASTED International Conference on Visualization, Imaging, and Image Processing ~VIIP 2009~ Vagan TerziyanIOG, Agora Center, MIT Departmentpresenter
ContentWeb evolution trendsHeterogeneity of resourcesIntelligent resource visualizationResource closeness/similarity browserDistance measurement functionVisualization componentFuture opportunities Browser enhancementPrototypeAcknowledgements
Web evolution trendsHuman becomes a very dynamic and proactive player in a large highly heterogeneous and distributed environment with a huge amount of different kind of data, services, devices, etc.
it is quite necessary to provide a technology and tools for easy and handy human information access and manipulation.Context-awareness and intelligence of user interface brings a new feature that gives a possibility for user to get not just raw data, but required information based on a specified context. Resource closeness/similarity search is one of the most popular features that users need during resource/information retrieving process. The similarity search has become a fundamental computational task in many applications: e-commerce, data mining and knowledge discovery, case-based reasoning, knowledge management, text, image and information retrieval, etc. Thus, visualization of the resources in a context of their similarity/closeness becomes important functionality of the GUI and browsers.
Heterogeneity of resourcesCurrent challenge is a distributed nature of information and therefore high heterogeneity of entities to be compared
information about these entities (resources) is coming from different sources and presented according to different schemas; it is quite challenging to automatically compare even obviously similar resources but represented by different properties; quite a lot of different definitions of closeness and appropriate mechanisms providing various functions to calculate the similarity as well attempts to smartly combine various similarity functions; the heterogeneity of resources attributes itself (numerical, nominal, logical, interval, textual, etc.) is a challenging problem and various methods to prepare data for computing similarity as well as computing functions are exist and it is always a challenge to select the right one for a particular task.Another important issue is how to represent results of a similarity based search to a human
what kind of visualization technique would be convenient for a user depending on his tasksWe expect to experience a future Web which will be media rich, highly interactive and user oriented. The value of this Web will lie not only in the massive amount of information that will be stored within it, but in the ability of Web technologies to organize, interpret and bring this information to the user.
Intelligent resource visualizationThe idea of intelligent resource visualization is to simplify the search and browsing processes via multidimensional associative resource visualization means visualization of a resource depending on a context, via association with various aspects of the resource (relations with other resources, domains, areas of interest, etc.). Such visualization can give us a hint, turn us to the right direction, show us related objects and provide links to them. In other words, visualization will utilize context-based filtering and enrichment of the visualized scene with the relevant links.
Intelligent resource visualization4I(FOR EYE) technology will enable creation of such smart human interfaces through flexible collaboration of an Intelligent GUI Shell, various visualization modules, which we refer to as MetaProvider-services, and the resources of interest. Semantically enhanced context-dependent multidimensional resource visualization provides an opportunity to create intelligent visual interface that presents relevant information in more suitable and personalized for user form.
Resource closeness/similarity browser Resource closeness/similarity browsing is a part of the 4I Browser functionality. The main common Interface part 4I GUI Shell, performs communication with resource repository and repository of visualization contests. MetaProvider performs visualization function: depending on realization collects all necessary data and visualizes resource/resources in context dependent way. Considering the MetaProvider that presents resources in a context of their closeness to the selected one, certain distance measuring calculation is performed before visualization phase. 4I GUI Shell uses xml-based resource storage. Such architecture requires converting the date from original format to xml representation. Comparison between the resources is performed based on common properties. Current implementation supports just five types of the parameters (properties):Text field types: word/sentence key words/sentences complex text fieldNumber fieldInterval field
StringsGeneral distance between to objects: Closeness of two resources (objects) equals:where and are the maximum and minimum values from the sample. where and are medians of corresponding samples, the coefficients and regulate significance of the distance between the intervals and difference between the lengths of the intervals. Text field type 3Text field type 2where is a number of matched/equal instances and is a general number of all instances in the lists of two comparable objects.Distance measurement function
Visualization componentWe decided to put the resources on a spiral that lies on a surface of the cone. The minimal distance between the resources has been taken as a step on an axis/height of the cone. Just that parameter (distance on the axis/height) shows the closeness of the resources. To avoid an overlap (in case of a viewpoint from the top of the cone) of the images that belong to resources located next to each other, we have calculated the location angle ( ) on each (step-based) cone cut. Additionally, we provided a possibility to rotate the cone to find the best view point.
Browser enhancementCurrent implementation of the 4I GUI Shell supportsvisual configuration of resource similarity visualization context.User has a possibility to create new, delete and modify the similarity visualization contexts. Such visualization context implies user specification of the resource properties significance and existence of additional contextual information for the resources properties (depending on their types). We consider the absolute significance of the resource fields as percentages from the full influence of the fields. In this case the sum of the fields significances should be equal 100%. The same approach has been applied for the sub fields if there are any. For the absolute significance system supports two modes: fully user controlled mode; mode with automatic recalculation of the significances.Sometimes user prefers to specify relative significance for the field/property. In this case user estimates the significance of each field/property by value from 0 to 100 separately. With the relative significance the absolute values do not make sense, only comparative differences of the values are taken into account.
Normalized Google Distance (NGD): Measures of Semantic Relatedness (MSRs) - remote services 1)2)Future opportunities General adapter that enables to convert data from any format to the required one is not exists. Further, we consider elaboration of a general adaptation module that can be imbedded to the Shell and will transform data from different formats to the internal resource representation format. Increasing of distance measuring methods amount and types of compared resource description fields is considered as a future direction. The same technique that we use for resource closeness visualization can be utilized for resource ranking. The only requirement for this is to describe virtual/abstract (or chose from existing) etalon resource and calculate the distances of all other resources to that one. Such approach can be utilized for simple ranking methods. For complex methods, for sure, we have to elaborate appropriate modifications.
Acknowledgements University of Jyvskyl