+ All Categories
Home > Technology > Search Computing Overview

Search Computing Overview

Date post: 08-May-2015
Category:
Upload: search-computing
View: 11,131 times
Download: 0 times
Share this document with a friend
Description:
Search Computing keynote @ CAISE 2010
80
Search Computing Stefano Ceri, Keynote talk at CAISE, Hammamet, June 9, 2010 Joint work with: Adnan Abid, Mamoun Abu Helu, Davide Barbieri, Daniele Braga, Marco Brambilla, Alessandro Bozzon, Alessandro Campi, Sofia Ceppi, Francesco Corcoglioniti, Emanuele Della Valle, Davide Eynard, Piero Fraternali, Nicola Gatti, Giorgio Ghisalberghi, Michael Grossniklaus, Davide Martinenghi, Marco Masseroli, Maristella Matera, Chiara Pasini, Elena Pellizzotti, Stefania Ronchi, Marco Tagliasacchi, Luca
Transcript
Page 1: Search Computing Overview

Search ComputingStefano Ceri, Keynote talk at CAISE, Hammamet, June 9, 2010

Joint work with: Adnan Abid, Mamoun Abu Helu, Davide Barbieri, Daniele Braga, Marco Brambilla, Alessandro Bozzon, Alessandro Campi, Sofia Ceppi, Francesco Corcoglioniti, Emanuele Della Valle, Davide Eynard, Piero Fraternali, Nicola Gatti, Giorgio Ghisalberghi, Michael Grossniklaus, Davide Martinenghi, Marco Masseroli, Maristella Matera, Chiara Pasini, Elena Pellizzotti, Stefania Ronchi, Marco Tagliasacchi, Luca Tettamanti, Salvatore Vadacca, Riccardo Volonterio, Serge Zagorac

Page 2: Search Computing Overview

Prof. Stefano CeriDatabase Management

Genesis of Search Computing

My “Gong Show” challenge at 2003 Lowell Workshop: “Find an ethnical restaurant in a nice place close to Milano” .

Logically a composition of domains:– Restaurants (ethnical)– Geo-locations (nice place close to Milano)

Composing maps with “geo-located” information is now solved by all search engines …

… but in general no system is capable of composing arbitrary semantic domains

Page 3: Search Computing Overview

Prof. Stefano CeriDatabase Management

Motivating Examples

“Who are the strongest candidates in Europe for competing on software ideas?”

3

“Who is the best doctor who can cure insomnia in a close-by hospital?”

“Where can I attend an interesting scientific conference in my field and at the same time relax on a beautiful beach nearby?”

Page 4: Search Computing Overview

Prof. Stefano CeriDatabase Management

Their Common Aspect

Multi-domain queries

Individual answers are on the Web

4

A knowledgeable user would do the query step-by-step:– Search database conferences, get their city– Check that the city average temperature is warm enough– Search low-cost flights via a broker for that city– Search luxury hotels via another broker

We want a system for supporting this search process– Build several “solutions” which already integrate all dimensions– Rank “solutions” according to a global rank function and output

results in rank order– Support user-friendly query definition and result browsing – Add search domains while the search proceeds – Possibly change the relative weight of each ranking

Page 5: Search Computing Overview

Prof. Stefano CeriDatabase Management

OVERALL FRAMEWORK

5

Page 6: Search Computing Overview

Prof. Stefano CeriDatabase Management

Search Computing architecture: overall view 6

Main Query flow

DomainRepository

Front End

Query Planner

Cache

Query To Domain Mapper

Cache

Query Analysis

Cache

Query Engine

OP 1 OP 2 OP N Cache...

WS-Framework

Cache

ServiceRepository

Result Transformation

Cache

WSWorld

High-Level Query

Sub-queries

ConcreteQuery Plan

Low-level queries Merged Results

DomainFramework

Cache

Final UserResults

<Uses> relation

Page 7: Search Computing Overview

Prof. Stefano CeriDatabase Management

Search Computing architecture: overall view 7

Main Query flow

DomainRepository

Front End

Query Planner

Cache

Query To Domain Mapper

Cache

Query Analysis

Cache

Query Engine

OP 1 OP 2 OP N Cache...

WS-Framework

Cache

ServiceRepository

Result Transformation

Cache

WSWorld

High-Level Query

Sub-queries

ConcreteQuery Plan

Low-level queries Merged Results

DomainFramework

Cache

Final UserResults

<Uses> relation

High level query“Where can I attend a DB

scientific conference close to a beautiful beach reachable

with cheap flights?”

Sub query 1“Where can I attend

a DB scientific conference?”

Sub query 2“place close to

a beautiful beach?” Sub query 3

“place reachable with cheap flight?”

Page 8: Search Computing Overview

Prof. Stefano CeriDatabase Management

Search Computing architecture: overall view 8

Main Query flow

DomainRepository

Front End

Query Planner

Cache

Query To Domain Mapper

Cache

Query Analysis

Cache

Query Engine

OP 1 OP 2 OP N Cache...

WS-Framework

Cache

ServiceRepository

Result Transformation

Cache

WSWorld

High-Level Query

Sub-queries

ConcreteQuery Plan

Low-level queries Merged Results

DomainFramework

Cache

Final UserResults

<Uses> relation

Low level query 1ConfSearch(“DB”,placeX,dateY)

Low level query 2TourSearch(“Beach”,PlaceX)

Low level query 3Flight(“cost<200”,PlaceX,DateY)

Page 9: Search Computing Overview

Prof. Stefano CeriDatabase Management

Search Computing architecture: overall view 9

Main Query flow

DomainRepository

Front End

Query Planner

Cache

Query To Domain Mapper

Cache

Query Analysis

Cache

Query Engine

OP 1 OP 2 OP N Cache...

WS-Framework

Cache

ServiceRepository

Result Transformation

Cache

WSWorld

High-Level Query

Sub-queries

ConcreteQuery Plan

Low-level queries Merged Results

DomainFramework

Cache

Final UserResults

<Uses> relation

Query plan

Services invocations and operators execution

Results

Presented resultsESWC-Crete-OlympicCAISE- Hammamet – AlitaliaTOOLS-Malaga-EasyJet

Page 10: Search Computing Overview

Prof. Stefano CeriDatabase Management

Search Computing architecture: incremental prototyping11

Prototype 1:Core behaviour of the system.

• Engine-based execution of queries

• Domain repository• Service repository • Coarse result

presentation

<Uses> relation

DomainRepository

Front End

Query Planner

Cache

Query To Domain Mapper

Cache

Query Analysis

Cache

Query Engine

OP 1 OP 2 OP N Cache...

WS-Framework

Cache

ServiceRepository

Result Transformation

Cache

WSWorld

High-Level Query

Sub-queries

ConcreteQuery Plan

Low-level queriesMerged Results

DomainFramework

Cache

Final UserResults

Ad

min

In

terf

ace

Lo

w-le

vel q

ue

rie

s

Su

b-q

ue

rie

s

Co

ncr

ete

Qu

ery

Pla

n

Page 11: Search Computing Overview

Prof. Stefano CeriDatabase Management

Search Computing architecture: incremental prototyping12

Prototype 1:Core behaviour of the system.

• Engine-based execution of queries

• Domain repository• Service repository • Coarse result

presentation

<Uses> relation

DomainRepository

Front End

Query Planner

Cache

Query To Domain Mapper

Cache

Query Analysis

Cache

Query Engine

OP 1 OP 2 OP N Cache...

WS-Framework

Cache

ServiceRepository

Result Transformation

Cache

WSWorld

High-Level Query

Sub-queries

ConcreteQuery Plan

Low-level queriesMerged Results

DomainFramework

Cache

Final UserResults

Ad

min

In

terf

ace

Lo

w-le

vel q

ue

rie

s

Su

b-q

ue

rie

s

Co

ncr

ete

Qu

ery

Pla

n

Prototype 2:Planning

• Automatic optimized query planning

Page 12: Search Computing Overview

Prof. Stefano CeriDatabase Management

Search Computing architecture: incremental prototyping13

Prototype 1:Core behaviour of the system.

• Engine-based execution of queries

• Domain repository• Service repository • Coarse result

presentation

<Uses> relation

DomainRepository

Front End

Query Planner

Cache

Query To Domain Mapper

Cache

Query Analysis

Cache

Query Engine

OP 1 OP 2 OP N Cache...

WS-Framework

Cache

ServiceRepository

Result Transformation

Cache

WSWorld

High-Level Query

Sub-queries

ConcreteQuery Plan

Low-level queriesMerged Results

DomainFramework

Cache

Final UserResults

Ad

min

In

terf

ace

Lo

w-le

vel q

ue

rie

s

Su

b-q

ue

rie

s

Co

ncr

ete

Qu

ery

Pla

n

Prototype 2:Planning

• Automatic optimized query planning

Prototype 3:Mapping and presentation

• mapping to domains• presentation of results

Prototype 4:High level queries

Page 13: Search Computing Overview

Prof. Stefano CeriDatabase Management

CAISE FOCUS on: Service Registration 14

Service Marts:

• Conceptual representation of resources as entities and connections

• Logical representation of signatures

• Physical representation as service implementations

Page 14: Search Computing Overview

Prof. Stefano CeriDatabase Management

CAISE FOCUS on: Front-end 15

DomainRepository

Front End

Query Planner

Cache

Query To Domain Mapper

Cache

Query Analysis

Cache

Query Engine

OP 1 OP 2 OP N Cache...

WS-Framework

Cache

ServiceRepository

Result Transformation

Cache

WSWorld

High-Level Query

Sub-queries

ConcreteQuery Plan

Low-level queries Merged Results

DomainFramework

Cache

Final UserResults

Liquid Query

Client-side framework for configuration and automatic rendering of query and result interfaces

User interaction primitives that allow to perform explanatory search

Page 15: Search Computing Overview

Prof. Stefano CeriDatabase Management

16CAISE FOCUS on: Development Process

Liquid Query

Liquid Result

Liquid Query Template

User Interface Specification

Final User

Expert User

<<uses>>

<<uses>>

<<produces>><<defines>>

<<submits>>

<<manipulates>>

Service Developer

SeCo Expert

Search Services SeCo platform<<implements>> <<deploys>>

Wrapping

Service Publisher

<<implements>>

Materialization / Normalization

Registration of Service Mart

<<performs>>

<<defines>>

<<uses>>

Service Mart Repository

<<produces>>

<<uses>>

De

plo

y T

ime

Ser

vice

Pu

blis

hin

g T

ime

Co

nfi

g.

Tim

eE

xecu

tio

n T

ime

Development Support Environment

Tools supporting Service Registration

Query Design

Performance Monitoring

Page 16: Search Computing Overview

Prof. Stefano CeriDatabase Management

SERVICE REGISTRATION

17

Page 17: Search Computing Overview

Prof. Stefano CeriDatabase Management

Service Registration in SeCo

Objective: providing a framework for registering services as first-class citizens within SeCo

=> Service Marts High-level abstractions of “real world entities” that provide a simple

interface to users and hide implementation details

Inspired by Data Marts, a data modeling pattern used in data warehousing

Each Service Mart can have multiple modalities of data access and can be mapped to multiple service implementations, possibly offered by different providers

=> Connection Patterns High-level abstractions of “real world relationships” that provide a

simple interface to users and hide implementation details

Built by means of attributes that share the same domains

Page 18: Search Computing Overview

Prof. Stefano CeriDatabase Management

Service Marts – Conceptual LevelEvery SM definition includes a name and a collection of the exposed attributes,

i.e. the attributes of the real world object described by the SMMovie(Title, Director, Year, Language, Genres(Genre), Actors(Name, Sex))

Atomic, single valued, typed attributes

Repeating groups (multi-valued, typed attributes) Each “repeating group” is a non-empty set of typed sub-attributes

that collectively defines a property of the service mart

The model choices are: To support structural complexity with only one level of nesting

(rather than an arbitrary level of nestings) To avoid explicit descriptions of relationship (using repeating

groups for M:N relationships)

Page 19: Search Computing Overview

Prof. Stefano CeriDatabase Management

Service Marts – Logical Level

At this level, each SM is associated with one or more Access Patterns, i.e.:

Movie1(TitleO, DirectorO, ScoreRO, YearO, LanguageI, Genres.GenreO,

Actors.NameO , Actors.SexO, Genres.GenreI)

Movie2(TitleI, DirectorO, YearO, LanguageO, Genres.GenreO, Actors.NameO ,

Actors.SexO)

Access patterns contain adorned attributes, i.e. attributes tagged with one of the following: I, if they are input attributes O, if they are Output attributes R, if they are attributes used for ranking – they may or may not be visible in output

Movie1 makes access to movies by Language and Genre (i.e., “action movies in English”) and results are ranked by Score (a new attribute).

Movie2 makes access to movies by Title (e.g. “Ben Hur”). We expect few (zero, one, more) results which are not ranked.

Page 20: Search Computing Overview

Prof. Stefano CeriDatabase Management

Service Marts – Physical Level

At this level, every Access Pattern can match different Service Implementations, having:

Physical URI to be called Physical properties which are specific to the implementation Mapping between logical and physical parameters

IMDBMovie1(MovieTitleO, DirectorO, StarsRO, YearO, LanguageI,

Genres.GenreI, Actors.NameO , Actors.GenderO)

IMDBMovie AP: Movie1

TTL=6000, chunksize=10, cacheable=true, exposed=false, ...

URI: http://...

Title Director Score Year Language ...

MovieTitle Director Stars Year Lang ...

Page 21: Search Computing Overview

Prof. Stefano CeriDatabase Management

External and Selector Attributes

external attributes, for supporting access and ranking

selector attributes, for supporting choices among service implementations

Movie(Title, Director, Year, Language, …)

Movie1: TitleO | DirectorO | YearO | … | ScoreRO | GenreI

External attributes

Movie(Title, Director, Year, Language, …)

Movie Implementation 1

Movie Implementation n

... Selector

Language

SM

AP

SM

SI

SI

Movien: TitleO | DirectorO | YearO | … | TitleIAP

Page 22: Search Computing Overview

Prof. Stefano CeriDatabase Management

Connection Patterns

Connections between marts only exist in terms of attributes that share the same domains, on different levels of abstraction:

Conceptually by a nondirected edge with a name: PlayingMovie(Movie,Theatre)

Logically by an edge (possibly directed) with name and join condition: PlayingMovie(Movie,Theatre): (Title=Movie.Title)

Movie Theatre

Movie4

Theatre2

Page 23: Search Computing Overview

Prof. Stefano CeriDatabase Management

Connection Patterns – Logical Level

Directed edge: Information is “piped” from one access pattern to another, along connection attributes which are in output in the first service and in input in the second service -> PIPE JOIN

Movie1

Title Director Score Year A.Name A.Sex

Theatre1

Name Address M.Start M.Title

G.GenreLanguage

Page 24: Search Computing Overview

Prof. Stefano CeriDatabase Management

Connection Patterns – Logical Level

Undirected edge: results are produced by both access patterns in output and then joined -> PARALLEL JOIN

Movie1

Title Director Score … … G.Genre

Theatre1

Name Address M.Start M.Title

Page 25: Search Computing Overview

Prof. Stefano CeriDatabase Management

Access pattern

Service Mart

Service Interface

Movie

Movie2

IMDB2

Access pattern

Movie1

Access pattern

Theatre1

Service Interface

IMDB1

Service Interface

Hyperrev1

Service Interface

Google1

Service Interface

NYLocalSearch

Service Mart

Theatre

Join of two Services, Pipe Version, NY CitySearch only in NY

Page 26: Search Computing Overview

Prof. Stefano CeriDatabase Management

JOIN OF TWO SEARCH SERVICES

27

Page 27: Search Computing Overview

Prof. Stefano CeriDatabase Management

JOIN of Web Services

Input: items resulting from TWO web service calls, possibly ranked

Output: composed items resulting from the concatenation of matching items, presented in a “global ranking order”

Matching condition using:– value equality,– partial set matching– term matching within a vocabulary…..

Services are known, their matching function is predefined: this is not service discovery!

Page 28: Search Computing Overview

Prof. Stefano CeriDatabase Management

Join 29

bx5

Service X Service Y

bx4

bx3

bx2

bx1

by5

by4

by3

by2

by1

r1

r2

r3

Page 29: Search Computing Overview

Prof. Stefano CeriDatabase Management

Matching items 30

Page 30: Search Computing Overview

Prof. Stefano CeriDatabase Management

Choice of the join strategies

The join search space– Different explorations for different joins methods under different

assumptions and with different guarantees

tij

Any exploration trajectory for this space is a join strategy

Chunksize

Candidate join result

Chunk

Page 31: Search Computing Overview

Prof. Stefano CeriDatabase Management

Nested Loop - Rectangular 32

Page 32: Search Computing Overview

Prof. Stefano CeriDatabase Management

Merge scan - Triangular 33

Page 33: Search Computing Overview

Prof. Stefano CeriDatabase Management

Parallel and Pipe Joins

Parallel join of two search services

Pipe join of two search services

34

(1,10)5(0,1)n

period: 500 msS1 S2

(1)

(2)

size: 20stop: 1

S1

S2

(1,2)n

period: 150 msC1

stop: 10excess: (1,1)

(1)

(2)

Page 34: Search Computing Overview

Prof. Stefano CeriDatabase Management

SUPPORT OF “SIMILARITY JOINS"

35

Page 35: Search Computing Overview

Prof. Stefano CeriDatabase Management

Supporting value similarity

Concept of “nearness” is widely implemented depending on different contexts, such as:

Lexical near (similar strings) Spatial near (between addresses/geo locations) Temporal near (between dates/times) Economic near (between costs)

Context is defined according to the attributes involved

=> Semantics of nearness built bottom-up, starting from the physical layer (available services) up to the conceptual one.

Page 36: Search Computing Overview

Prof. Stefano CeriDatabase Management

Similarity comes from Shared Domains

restaurant

The attribute “address” is shared by the 4 entities. Its semantic type, describing a location, enables “nearness” connections between each pair of entities (i.e. addresses can be compared for “nearness” within the same city, country, …)

theatre

apartment

hotel

Address

Address

Address

Address

Spatial Near

Page 37: Search Computing Overview

Prof. Stefano CeriDatabase Management

Supporting Nearness within Services

Several physical services natively support ranking by distances (e.g. GoogleMovies)

E.g.: GoogleMovies receives the user address as input, and returns theatres ranked by distance, each one with its address as output. UserAddress and Distance are external attributes.

GoogleMovies(UserAddressI, DistanceR | NameO, AddressO, Movie.TitleI, Movie.StartTimeO)

GoogleMovies AP: Theatre1

TTL=6000, chunksize=10, cacheable=true, provides=Spatial Near

URI: http://...

UserAddress Name Address M.Title M.StartTime ...

IAddr Name OAddr MovieTit MovieTime ...

Page 38: Search Computing Overview

Prof. Stefano CeriDatabase Management

“Nearness” Support within Services

GoogleMovies AP: Theatre1

TTL=600, chunksize=10, cache=1, provides=Spatial Near

URI: http://...

UserAddr Name Address M.Title ...

Addr Name Addr MovieTit ...

Theatre1 UserAddress Address M.Title M.StartTimeName

Restaurant2 NameAddress Cuisine Price

Spatial near

Theatre Restaurant

Spatial Near

Distance

Page 39: Search Computing Overview

Prof. Stefano CeriDatabase Management

Nearness Services within the Execution Engine

Ad-hoc services providing the notion of distance at the physical level require two domain values as input and produce their distance as output

Two input attributes to specify two values of the domain One output attribute specifies the distance in given units

SpatialNear System

TTL=600, chunksize=1, cacheable=1, ...

URI: http://...

Input1, Input2: Coordinates Output: Distance (Km)

Page 40: Search Computing Overview

Prof. Stefano CeriDatabase Management

Theatre1 Address M.Title M.StartTimeName

Restaurant2 NameAddress Cuisine

Theatre Restaurant

Spatial Near

Addr1 Addr2 DistanceSpatial Near

Price

SpatialNear System

TTL=600, chunksize=1, cacheable=1, ...

URI: http://...

Input1, Input2: Coordinates Output: Distance (Km)

Supporting Nearness within the Execution Engine

Page 41: Search Computing Overview

Prof. Stefano CeriDatabase Management

Access pattern

Service Mart

Service Interface

Movie

Movie2

IMDB2

Access pattern

Movie1

Access pattern

Theatre1

Service Interface

IMDB1

Service Interface

Hyperrev1

Service Interface

Google1

Service Interface

NYLocalSearch

Service Mart

Theatre

Service Mart

Restaurant

AP providing

spatial near

Rest1

Access pattern

Rest2

Service Interface

Yahoo1

Service Interface

Yahoo2

Join of three Services at the three Levels in NY

Spatial Near

Search only in NY

Page 42: Search Computing Overview

Prof. Stefano CeriDatabase Management

Three Levels with Connection Semantics

Services Connections

Conceptual Service Mart Name (with associated semantics)

Bindings between SM and AP attributes, plus definition of extra

attributes

Logical Access PatternJoin attributes,directed vs

undirected edge (with nearness service APs added as needed)

Bindings between AP attributes and SI

parameters

PhysicalService Interface (with

associated semantics and with system services)

Nearness Services

Page 43: Search Computing Overview

Prof. Stefano CeriDatabase Management

Resource graph

Concert

Specialized way for describing search service based knowledge available on the web [ER model, ontology, class diagram?]

Artist

ExhibitionRestaurant

Hotel

Movie

Metro Station

Theatre

Photo

Landmark

News

...

Piece

...

...

...

...

ShoppingCenter...

...

...

Page 44: Search Computing Overview

Prof. Stefano CeriDatabase Management

APPLICATION DEVELOPMENT PROCESS

46

Page 45: Search Computing Overview

Prof. Stefano CeriDatabase Management

SeCo development process

Main Roles:• Service

developer• Service

publisher• Expert user• SeCo expert

Dichotomy:• Top-down

vs. Bottom-up

• Run time vs. Design time

Implement search service

Wrap or materialize service

Register service mart and interface

Service Mart model

Service developer

Service publisher

Design Liquid Query TemplateExpert user

Liquid Query model

Sea

rch

Ser

vice

D

evel

opm

ent

Ser

vice

A

dapt

atio

n an

d R

egis

trat

ion

App

licat

ion

Con

figur

atio

n

Panta Rhei plan refinementSeCo expert

Que

ry P

lan

Ref

inem

ent

Manual optimization needed?

N

Y

Query Plan model

Page 46: Search Computing Overview

Prof. Stefano CeriDatabase Management

The service registration process

Service Description

SM Identification

Some SM retrieved

?SM CREATION

Modification of the SM structure?

Buttom up Strategy

SM UPDATE

SM MAPPING

YES NO

Top down Strategy

Hybrid Strategy

Associated SI Update (new connections)

YES

NO

Service Physical Description

AP CREATION

END

Page 47: Search Computing Overview

Prof. Stefano CeriDatabase Management

The SM Creation process, with semantic hintsSM CREATION

SM Name and attributes schema definition

SM and attributes Semantical Description

Synsets (and tags?)

Connection patterns (CP) definition

WN

Type conventions

Spatial_nearTextual near

Temporal_near

Movie(Title, Director, Score, Year, Genres(Genre), Openings(Country, Date), Actors(Name))

Movie: S: (n) movie, film, picture, moving picture, moving-picture show, motion picture, motion-picture show, picture show, pic, flick

(a form of entertainment that enacts a story by sound and a sequence of images giving the

illusion of continuous movement) "they went to a movie every Saturday night";

Director: S: (n) film director, director (the person who directs the making of a film)

Defined CP: Shows Textual_near

Possible CP: Title (String) Textual_near

Year (Date) Temporal_near …Composition Language operators association

Theatres

SMn

SM1

Shows(Movie, Theatre): [(Title=Title)] 

Automatic recommendation of connectable SMs

Page 48: Search Computing Overview

Prof. Stefano CeriDatabase Management

The SM Mapping procedure

SM MAPPING

Movie(Title, Director, Score, Year, …)

Original SM

ImdbMovie: Title | Director | Score | Year | …

Corresponding SM attributes

Auxiliary attributes (i.e. query attributes)

Director: StringDirector: S: the person who directs the making of a film)Director (String)

SI

f

Selector

Selector attributes

Page 49: Search Computing Overview

Prof. Stefano CeriDatabase Management

SeCo Tools

• Online tool suite that covers the whole development process

• Mashup-based

• Built by using state of the art technologies:

1. MVC on the client: Javascript MVC2. UI organization and panels: Yahoo! User Interfaces3. Diagram drawing and editing: WireIt

Page 50: Search Computing Overview

Prof. Stefano CeriDatabase Management

53Service Mart Registration

Page 51: Search Computing Overview

Prof. Stefano CeriDatabase Management

54Mapping editor

Page 52: Search Computing Overview

Prof. Stefano CeriDatabase Management

55Query Registration Interface

Page 53: Search Computing Overview

Prof. Stefano CeriDatabase Management

56Query Registration Editor, Logical Connections

Page 54: Search Computing Overview

Prof. Stefano CeriDatabase Management

LIQUID QUERY INTERFACE

57

Page 55: Search Computing Overview

Prof. Stefano CeriDatabase Management

Liquid Query

“ A new paradigm allowing users to formulate and get responses to multi-domain queries through an exploratory information seeking approach, based upon structured information sources exposed as software services…”

•Composite answers obtained by aggregating search results

from various domains

•Highlight the contribution of each search service

•Join of results based on the structural information afforded by the search service interfaces

•Refine the user query

•Re-shape the result list

Page 56: Search Computing Overview

Prof. Stefano CeriDatabase Management

Concert

Artist

ExhibitionRestaurant

Hotel

Movie

Metro Station

Theatre

Photo

Landmark

News

...

Piece

...

...

...

...

ShoppingCenter...

...

...

Photo

Liquid query definition

Concert

It consists of subsetting and parametrizing the resource graph...

Metro Station

RestaurantNewsExhibition

Artist

Hotel

= inputs, outputs + GR = global ranking

Page 57: Search Computing Overview

Prof. Stefano CeriDatabase Management

Photo

Liquid query definition

Concert

... And then characterizing the user interaction

Metro Station

RestaurantNewsExhibition

Artist

Hotel

Plus:

• Parametrization of global ranking

• Data visualization options

• .. and so on

Expand

Page 58: Search Computing Overview

Prof. Stefano CeriDatabase Management

Query Submission

Concert query conditions

Hotelsquery conditions

Page 59: Search Computing Overview

Prof. Stefano CeriDatabase Management

Query Execution & Result Presentation

Page 60: Search Computing Overview

Prof. Stefano CeriDatabase Management

SECO ENGINE

63

Page 61: Search Computing Overview

Prof. Stefano CeriDatabase Management

Overview

The tools is aimed at developers and permits to compose, plan and run a SeCo query

Four panels, one for each query processing phase:

Splashscreen!

Page 62: Search Computing Overview

Prof. Stefano CeriDatabase Management

Query composition (1)

Service interface browser

• lists registered service interfaces• Input and output parameters are listed

Selected service’s statistics

• collected service statistics are displayed• statistics may be edited for testing purposes

Page 63: Search Computing Overview

Prof. Stefano CeriDatabase Management

Query composition (2)

User-entered datalog-like query

• joins implicitly encoded by datalog vars• $vars encode query inputs provided at runtime

Query optimisation parameters• control the behaviour of the planner

• trigger the planning process

Page 64: Search Computing Overview

Prof. Stefano CeriDatabase Management

Logical planning

Page 65: Search Computing Overview

Prof. Stefano CeriDatabase Management

Physical planning

Page 66: Search Computing Overview

Prof. Stefano CeriDatabase Management

Query execution (1)

Execution session management

• a session corresponds to a single query execution, where multiple user commands may be issued

• query input parameters are specified at session initialisation

Execution status

• displays the current session status• displays the status of the execution commands issued so far

Execution commands forms

• a more-all command requires more query results• a more-one command requires more results by extracting more data from a specific service invoked by the query

Page 67: Search Computing Overview

Prof. Stefano CeriDatabase Management

Query execution (2)

Query results

• Displays ranked results, as soon as computed

Execution timeline

• displays activation of execution units (e.g. service calls)

• useful to fine tune the engine and the join strategies

Page 68: Search Computing Overview

Prof. Stefano CeriDatabase Management

Query execution (3)

Service calls log

• displays service calls at the chunk granularity • shows response times, statistics, cache behaviour

Page 69: Search Computing Overview

Prof. Stefano CeriDatabase Management

DEMO

http://demo.search-computing.eu

Page 70: Search Computing Overview

Prof. Stefano CeriDatabase Management

SUMMARY OF SECO RESULTS

73

Page 71: Search Computing Overview

Prof. Stefano CeriDatabase Management

Results after 18 months

Concepts– Service marts, rank join methods, panta rhei, liquid query

Research results– Springer LNCS: Search Computing Challenges and Directions– Many publications (withVLDB,WWW), many ongoing submissions– Filing of US Patent (top-k method, random & sequential services)

Prototypes – Execution environment, focus on liquid query and on integration – Design support environment, focus on mashups

Dissemination– Fifteen keynote talks, twelve articles in the Italian press– SeCo Web site, SeCo blog, facebook, linked-in, twitter communities– Search Computing Graduate Course at PoliMi

Temporary research positions (1 phd, 5 post-ms, 3 post-doc)

74

Page 72: Search Computing Overview

Prof. Stefano CeriDatabase Management

75Publications

SeCo- D. Braga, A. Campi, S. Ceri, A. Raffio Joining the results of heterogeneous search engines Information Systems, Vol. 33, Issues 7-8, (November-December 2008), Pages 658-680

- D. Braga, S. Ceri, F. Daniel, D. Martinenghi Optimization of Multi-Domain Queries on the Web VLDB 2008: 562-573, Auckland, New Zealand, August 2008

- D. Braga, S. Ceri, F. Daniel, D. Martinenghi Mashing Up Search Services, IEEE Internet Computing 12(5): 16-23 (2008)

- D. Braga, D. Calvanese, A. Campi, S. Ceri, F. Daniel, D. Martinenghi, P. Merialdo, R. Torlone, NGS: a framework for multi-domain query answering, ICDE Workshops 2008: 254-261

- S. Ceri, Search Computin Invited Paper, 25th International Conference on Data Engineering, Shanghai, March 29 - April 2, 2009

- D. Barbieri, A. Bozzon, D. Braga, M. Brambilla,A. Campi, S. Ceri, E. Della Valle, P. Fraternali, M. Grossniklaus, D. Martinenghi, S. Ronchi, M. Tagliasacchi Data-driven optimization of -

search service composition for answering multi-domain queries (USETIM 2009) workshop at VLDB 2009, Lyon, France, August 24-28, 2009

- M.Brambilla, S. Ceri, Engineering Search Computing Applications: Vision and Challenges The 7th joint meeting of the European Software Engineering Conference (ESEC) and the ACM

SIGSOFT Symposium on the Foundations of Software Engineering (FSE), Amsterdam, The Netherlands, August 24-28 2009

- S. Ceri Search Computing The 2009 IEEE/WIC/ACM International Conference on Web Intelligence, Milan, Italy, September 15-18 2009

- S. Ceppi and N. Gatti, An Automated Mechanism Design Approach for Sponsored Search Auctions with Federated Search Engines In Proceedings of the 12^th Workshop on Agent-

Mediated Electronic Commerce (AMEC) in the 9^th International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Toronto, Canada May 10 2010

- D. Martinenghi, M. Tagliasacchi, and S. Ceri Top-k pipe-join International Workshop on Ranking in Databases, Long Beach, USA, March 2010

- A. Bozzon, M. Brambilla, S. Ceri, P. Fraternali Liquid Query: Multi-Domain Exploratory Search on the Web WWW 2010 - 19th International World Wide Web Conference - Raleigh,

North Carolina, April 26-30 2010

- A. Campi, S. Ceri, A. Maesani, S. Ronchi Designing Service Marts for Engineering Search Computing Applications The Tenth International Conference on Web Engineering, ICWE

2010, Vienna, Austria, July 5-9 2010

Related- M. Brambilla, S. Ceri, I. Celino, D. Cerizza, E. Della Valle, F. M. Facca, A. Turati, C. Tziviskou Experiences in the Design of Semantic Services Using Web Engineering Methods and Tools

Journal on Data Semantics 2008- A. Raffio, D. Braga, S. Ceri, P. Papotti, M. Hernandez Clip: a Visual Language for Explicit Schema Mappings International Conference on Data Engineering (ICDE), April 2008

- D. Braga, D. Calvanese, A. Campi, S. Ceri, F. Daniel, D. Martinenghi, P. Merialdo, R. Torlone A New Generation Search Engine Supporting Cross Domain Queries Italian Symposium on

Advanced Database Systems (SEBD), June 2008

- D. Braga, D. Calvanese, A. Campi, S. Ceri, F. Daniel, D. Martinenghi, P. Merialdo, R. Torlone NGS: a Framework for Multi-Domain Query Answering IIMAS, International Conference on Data

Engineering Workshops (ICDE), April 2008

- A. Raffio, D. Braga, S. Ceri, P. Papotti, M. Hernandez Clip: a Tool for Mapping Hierarchical Schemas ACM SIGMOD/PODS Conference, Demo Session, June 2008

- A. Bozzon, M. Brambilla, P. Fraternali Conceptual Modeling of Multimedia Search Applications Using Rich Process Models ICWE 2009, Springer LNCS, vol. 5648, ISBN 978-3-642-02817-5.

- E. Della Valle, S. Ceri, D. F. Barbieri, D. Braga, A. Campi A First Step Towards Stream Reasoning Future Internet Symposium (FIS) 2008, pp. 72-81.

- A. Bozzon, M. Brambilla, F. M. Facca, G. Toffetti Carughi A Conceptual Modeling Approach to Business Service Mashup Development IEEE International Conference on Web Services, ICWS

2009, Los Angeles. IEEE Press, July 2009, pp. 751 - 758.

- P. Fraternali, M. Brambilla, A. Bozzon, Model-Driven Design of Audiovisual Indexing Processes for Search-Based Applications Content-Based Multimedia Indexing, 2009, CBMI '09, IEEE Press,

ISBN: 978-1-4244-4265-2, pp. 120-125.

- D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle and M. Grossniklaus, C-SPARQL: SPARQL for Continuous Querying Proceedings of WWW 2009, 18th International World Wide Web Conference

(Poster), Madrid, Spain, April 2009

- D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle and M. Grossniklaus Continuous Queries and Real-time Analysis of Social Semantic Data with C-SPARQL

In Proceedings of SDoW 2009, 2nd ISWC Workshop on Social Data on the Web, Washington, DC, USA, October 2009

- D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle and M. Grossniklaus C-SPARQL: A Continuous Query Language for RDF Data Streams International Journal of Semantic Computing (IJSC), 2010,

World Scientific Publishing

- D. F. Barbieri, D. Braga, S. Ceri and M. Grossniklaus An Execution Environment for C-SPARQL Queries In Proceedings of EDBT 2010, 13th International Conference on Extending Database Technology,

Lausanne, Switzerland, March 2010

Page 73: Search Computing Overview

Prof. Stefano CeriDatabase Management

Web Site & Blog

Web Site

Tech Watch Blog

Blog stats: ~ 900 absolute unique visitors in the last two months

76

Page 74: Search Computing Overview

Prof. Stefano CeriDatabase Management

Accesses to Web Site & Blog

Provenance

Sources

77

Visits: 20% USA, 18% Italy, 6% UK, 4% India, 4% Canada

Page 75: Search Computing Overview

Prof. Stefano CeriDatabase Management

Search Computing First Workshop June 17-19, 2009 78

Page 76: Search Computing Overview

Prof. Stefano CeriDatabase Management

Search Computing Challenges and Directions (LNCS, vol. 5950, Ceri-Brambilla eds.)

Part 1: Vision– Ceri: Search computing – Baeza-Yates: Next generation search– Weikum: Search for knowledge 

Part 2: Technology Watch  – Della Valle-Buganza-Gatti: The search engine industry– Casati-Daniel-Soi: Mashup technologies– Baumgartner-Campi-Gottlob-Herzog: Web data extraction– Hedeler-Belhajjame-Campi-Embury-Fernandez-Paton:Dataspaces– Bozzon-Fraternali: Multimedia and multimodal information retrieval

Part 3: Issues in Search Computing – Campi-Ceri-Gottlob-Ronchi: Service marts– Braga-Campi-Grossniklaus: Join methods and query optimization– Ilyas-Martinenghi-Tagliasacchi: Rank aggregation – Braga-Grossinklaus-Ceri: Panta Rhei, a query execution environment– Brambilla-Ceri-Fraternali-Manolescu: Liquid queries and liquid results– Brambilla-Ceri: Software engineering of search computing applications– Masseroli-Paton-Spasic: Search computing and the life sciences

79

Page 77: Search Computing Overview

Prof. Stefano CeriDatabase Management

Second Workshop: Design Principles

Consolidate several ongoing research chapters touching the various aspects of the project

Develop connections to other research projects so as to share knowledge - and possibly build cooperations based on mutual complementarity.

Setting internal deadlines to project evolution– Being ready for the workshop– Dump organisational responsibility to session chairs

Try a more discussion-oriented format– Our view– Guest’s views– Panel/discussion (sometimes driven, sometimes not)

Produce Proceedings as Springer LNCS, each session contributing to a short part

80

Page 78: Search Computing Overview

Prof. Stefano CeriDatabase Management

81Second SeCo Workshop Last Week

Page 79: Search Computing Overview

Prof. Stefano CeriDatabase Management

Second Workshop: Sessions

Pre-Workshop (Milano, May 25) – Search as a Process – Business Models

Workshop (Como, May 26-28)– Semantic Resource Framework– Wrapping Technology and Ontological Annotation– Design Tools and Mashup Languages– Search Computing and Research Evaluation– Query Processing– Rank Join– Search Computing for BioMedical Applications– User-Centered Approach to Search Computing Applications

Post-Workshop (Milano, May 31) – Visual Interfaces for Complex Search

82

Page 80: Search Computing Overview

Prof. Stefano CeriDatabase Management

83Looking forward

Establish stronger co-operation with other projects– Both for technology and applications

Strengthen SeCo “core research”– Cover the process lifecycle with methods & tools– Improve result visualization and user interaction– Use semantics in service registration and query processing– Turn Panta Rhei into a full Service Base Management System

(SBMS) with new rank join methods, proximity, uncertainty…

Strengthen the prototypes– Fully develop the registration environment– Extend the execution environment, make it scalable over clouds– Extend the liquid interface, cover mobile interfaces

Put a “killer” application online (usable!)

Explore exploitation options


Recommended