+ All Categories
Home > Documents > Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig...

Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig...

Date post: 21-Dec-2015
Category:
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
47
Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)
Transcript
Page 1: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Constraint-based Information Integration

Steven MintonFetch Technologies

Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Page 2: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Example Application

Tiger MapServer

Geocoder

Zagat Restaurants Guide

Integration System

LA CountyRestaurant

Health Ratings

Page 3: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Outline Agents that access information

sources on the web AgentBuilder – learning from

examples ActiveAtlas -- standardizing data from

multiple sources Constraint-based Integration

Heracles – putting it all together

Page 4: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Information Agents

Decision SupportDecision Support Application ProgramsApplication Programs

Information AgentInformation Agent

Knowledge BasesKnowledge BasesDatabasesDatabases Computer ProgramsComputer ProgramsThe WebThe Web

Page 5: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Web Agents

Web agents provide uniform query language for data access: “Wrapping a web site”

Restaurants inSanta Monica?

Name AddressChinois on Main 2709 Main St.Chao Dara 13 Union Sq.

… ...

Page 6: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

AgentBuilder Supervised learning: Extraction

rules created from examples High precision High reliability

Page 7: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Extraction technology Expressive extraction rule

language: Extraction rule = sequence of

landmarks Describes how to find the beginning

and end of each field

Start: SkipTo(Cuisine :) SkipTo(<b>) End: SkipTo(</b>)

PAGE:<html> Name:<b> KFC </b> Cuisine :<p> <b> Fast Food </b> <br>...

Page 8: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

A Sequential Covering Algorithm for “Wrapper Induction”

Training Examples: Name: Del Taco <p> Phone (toll free) : <b> ( 800 ) 123-4567 </b><p>Cuisine ...

Name: Burger King <p> Phone : ( 310 ) 987-9876 <p> Cuisine: …

Page 9: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

A Sequential Covering Wrapper Induction Algorithm

Training Examples: Name: Del Taco <p> Phone (toll free) : <b> ( 800 ) 123-4567 </b><p>Cuisine ...

Name: Burger King <p> Phone : ( 310 ) 987-9876 <p> Cuisine: …

Initial candidate: SkipTo( ( )

Page 10: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

A Sequential Covering Wrapper Induction Algorithm

SkipTo( <b> ( ) ... SkipTo(Phone) SkipTo( ( ) ... SkipTo(:) SkipTo(()

Training Examples: Name: Del Taco <p> Phone (toll free) : <b> ( 800 ) 123-4567 </b><p>Cuisine ...

Name: Burger King <p> Phone : ( 310 ) 987-9876 <p> Cuisine: …

Initial candidate: SkipTo( ( )

Page 11: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

A Sequential Covering Wrapper Induction Algorithm

SkipTo( <b> ( ) ... SkipTo(Phone) SkipTo( ( ) ... SkipTo(:) SkipTo(()

Training Examples: Name: Del Taco <p> Phone (toll free) : <b> ( 800 ) 123-4567 </b><p>Cuisine ...

Name: Burger King <p> Phone : ( 310 ) 987-9876 <p> Cuisine: …

Initial candidate: SkipTo( ( )

… SkipTo(Phone) SkipTo(:) SkipTo( ( ) ...

Page 12: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Outline Agents that access information

sources on the web AgentBuilder – learning from

examples Atlas -- standardizing data from

multiple sources Constraint-based Integration

Heracles – putting it all together

Page 13: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

The Problem:Multi-Source Inconsistency

How can the same objects be identified when they are stored in inconsistent text formats?

Art’s DelicatessenCa’ BreaCPKThe GrillPatinaPhilippe’s The OriginalThe Tillerman

Art’s DeliCalifornia Pizza KitchenCampanileCitrusGrill, ThePhilippe The OriginalSpago

Zagat’s Restaurant Guide Health Dept Restaurant Listings

Page 14: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

The Solution: Record Linkage

Name Street Phone

Art’s Deli 12224 Ventura Boulevard 818-756-4124

Teresa's 80 Montague St. 718-520-2910

Steakhouse The 128 Fremont St. 702-382-1600

Les Celebrites 155 W. 58th St. 212-484-5113

Name Street Phone

Art’s Delicatessen 12224 Ventura Blvd. 818/755-4100

Teresa's 103 1st Ave. between 6th and 7th Sts. 212/228-0604

Binion's Coffee Shop 128 Fremont St. 702/382-1600

Les Celebrites 160 Central Park S 212/484-5113

Zagat’s Restaurants Dept. of Health

Page 15: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Zagat’s Agent Dept. of Health Agent

Query

Record Linkage

  Name Street Phone

Art’s Deli 12224 Ventura Boulevard 818-756-4124

Teresa’s 80 Montague St. 718-520-2910

Steakhouse The

128 Fremont St. 702-382-1600

Les Celebrites 155 W. 58th St. 212-484-5113

   Name Street Phone

Art’s Delicatessen

12224 Ventura Blvd. 818/755-4100

Teresa’s 103 1st Ave. between 6th and 7th Sts.

212/228-0604

Binion’s Coffee Shop

128 Fremont St. 702/382-1600

Les Celebrites 5432 Sunset Blvd 212/484-5113

Zagat’s Dept of Health

Page 16: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Approach to Record Linkage

Learning attribute weighting rules

Learning general transformation rules

Name Street Phone

Zagat’s

Dept of Health

Art’s Deli 12224 Ventura Boulevard 818-756-4124

Art’s Delicatessen 12224 Ventura Blvd. 818/756-4124

Art’s DeliCalifornia Pizza KitchenPhilippe The Original

Zagat’s Dept of Health

Art’s DelicatessenCPKPhilippe’s The Original

AbbreviationAcronymStemming

TransformationsRules

Page 17: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Active Learning to Determine Matched Records[Tejada, Knoblock, Minton ’01,’02]

Learn importance of attributes for matching records

Zagat’s

Dept of Health

Art’s Deli 12224 Ventura Boulevard 818-756-4124

Art’s Delicatessen 12224 Ventura Blvd. 818/755-4100

Name Street Phone

Mapping rules:

Name > .9 & Street > .87 => mapped

Name > .95 & Phone > .96 => mapped

Page 18: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Active AtlasMapping Rule Learner

Set of Mapped Objects

Choose initial examples

Generate committee of learners

Learn Rules

ClassifyExamples

Votes Votes Votes

Choose Example

USERLearn Rules

ClassifyExamples

Learn Rules

ClassifyExamples

Label

Label

Page 19: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Committee Disagreement

Chooses an example based on the disagreement of the query committee

CPK, California Pizza Kitchen is the most informative example

Art’s Deli, Art’s DelicatessenCPK, California Pizza KitchenCa’Brea, La Brea Bakery

Yes Yes Yes Yes No Yes No No No

Examples M1 M2 M3Committee

Page 20: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Outline Agents that access information

sources on the web AgentBuilder – learning from

examples ActiveAtlas -- standardizing data from

multiple sources Constraint-based Integration

Heracles – putting it all together

Page 21: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Constraint-based Integration

Integrating data from multiple sources often involves reasoning about the information

Constraints provide a approach to expressing relationships and filtering data

Page 22: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Heracles Framework for building integrated

applications Interleaves planning and information

gathering Uses a constraint reasoner to decide

what sources to query and to integrate the results

Page 23: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

The Travel Assistant

Page 24: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Dynamically Updates Slots as Information Becomes Available

BLACK

GREEN

GREEN

GREEN

GREEN

GREEN

GREEN

GREEN

GREEN GREEN

GREEN GREEN

BLACK

GREEN GREEN

GREENBLUE

BLUE RED

REDRED

RED

RED

RED

RED

RED

RED

RED

Page 25: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Supports Informed Choices

Page 26: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Changes Propagate Throughout

Page 27: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

User Can Specify High-Level Preferences

Page 28: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Constraint Networks for Managing Information Constraint reasoning system

Propagates information Decides when to launch information requests Evaluate constraints Computes preferences All run as asynchronous processes to support the

user Components:

Representation of the variables Representation of constraints Hierarchical templates Constraint propagation

Page 29: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Constraint Networks for Integrating Information Components:

Representation of the variables Representation of constraints Hierarchical template representation Constraint propagation and cycle

detection

Page 30: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Constraint Variables

Constraint network consists of a set of variables such as: MeetingStartTime MeetingLocation

Variables are related by constraints that determine the possible values of a solution

Page 31: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Constraint Networks for Integrating Information Components:

Representation of the variables Representation of constraints Hierarchical template representation Constraint propagation and cycle

detection

Page 32: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Constraint Representation

Constraints are computable components: Local calculations (e.g., Xquery)

MeetingStartTime + MeetingDuration --> MeetingEndTime

Web and Database Wrappers ITN: DepartureAirport, ArrivalAirport, Date --> Flights Yahoo Weather: City, Date --> Weather predication

External Programs (Outlook, Planners, etc) Outlook Calendar: Date --> Meetings

Results cached in tables

Page 33: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

DepartureDate

ReturnDate

computeDurationDepartureAirport

ParkingRate

ParkingTotalDurationgetParkingRate

TaxiFare

DestinationAddress

GetTaxiFare

multiply SelectModeToAirport

ModeToAirport

Sep 30, 2000

Oct 2, 2000

3 days

LAX

$7.00/day

$21.00 $23.00

Drive

GetDistanceOriginAddress

Distance

15.1 miles

FindClosestAirport

Drive or Take a Taxi?

Page 34: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Constraint Networks for Integrating Information Components:

Representation of the variables Representation of constraints Hierarchical template representation Constraint propagation and cycle

detection

Page 35: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Hierarchically-Partitioned Constraint Networks

Template: Groups related variables and constraints Organizes information for computation and

presentation to user Templates organized hierarchically

Template decomposed into subtemplates Choose among alternative subtemplates

Page 36: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Template Structure

Template Arguments: input and output

variables Variables: name, type, default values Constraints Expansions: alternative subtemplate

calls GUI specification

Page 37: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Who Company

Subject

Starting Time

Ending Time

Origin Addr.Dest. Addr.

OriginWeatherDest Weather

Distance

Travel Mode

Depart Time Depart Airport

Arrival Airport

Flight Num

Arrival Time Parking Lot

Parking Rate Mode toAirport

Dist. toAirport

Taxi Fare

Partitioned Constraint Network

Page 38: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Template Hierarchy for the Travel Assistant

Trip

ModeNext

Drive

ModeToDestination

Fly

ModeToAirport

Taxi

FlightDetail

Hotel

ModeHotel

NoOvernight

1

1 2

32

Trip(Return Home)

Trip(Return Office)

Trip(New Leg)

ModeFromAirport3 End

Trip

AND

OROR OR

AND

Drive Taxi

OR

Drive Taxi

OR

Page 39: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Dynamic NetworksGeneralization of Constraint Networks Variables can be active or inactive Normal Constraints x1 = k1 ^ … ^ xm = km xn = kn

Activity constraints: x1 = k1 ^ … ^ xm = km active(xn) Inactive variables do not participate in the

network, i.e., do not propagate constraints

Page 40: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Heracles: Template Selection Core network

Computes values of template selection vars

Always active Template selection variables

Inputs to activity constraints: determine the choice of subtemplates, i.e., which additional variables are active

Page 41: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Constraint Networks for Integrating Information Components:

Representation of the variables Representation of constraints Hierarchical template representation Constraint propagation

Page 42: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Constraint Propagation

Approach When a variable is assigned a value, re-compute the value

sets and assigned values of all dependent variables Proceeds recursively until no values are changed or a cycle

is detected Core network

Propagates all variables through the core network Remaining variables are computing when a template is

opened Does not perform full CSP

Less costly Does not require all information in advance Makes choices locally, so may fail to find optimal

assignment

Page 43: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Discussion General framework for interleaving

planning and information gathering Retrieves information as needed Gathers and integrates data in a uniform

framework Evaluates tradeoffs and selects among

alternatives Allows the users to explore alternatives Supports a wide variety of information types:

databases, web pages, images, video, etc.

Page 44: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

SmartClients [Torrens et al, 2002]

Cast an integration problem as a Constraint Satisfaction Problem (CSP)

Given a request, the server retrieves the required data and sends the data and the CSP to the client

Client solves the CSP locally Large complex problem transmitted in small

amount of space Provides fine-grained user interaction with

the data

Page 45: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Architecture for SmartClients

Page 46: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

SmartClients: Pros and Cons Pros

Elegant approach that exploits past work on CSPs

Minimizes the data retrieval and supports complex reasoning and integration of the data

Cons Assumes that all data can be retrieved before

any reasoning about the data In the travel planning, assumes that prices

are the same on any date and there are no issues with flight availability

Page 47: Constraint-based Information Integration Steven Minton Fetch Technologies Joint work with Craig Knoblock and Jose Luis Ambite (USC/ISI)

Summary Our approach for creating

“web assistants”: Agents for accessing web data Record linkage for mapping

between sources Constraint-based integration

provides the glue


Recommended