+ All Categories
Home > Documents > 1 MASSACHUSETTS INSTITUTE OF TECHNOLOGY SLOAN SCHOOL OF MANAGEMENT INFORMATION TECHNOLOGIES GROUP...

1 MASSACHUSETTS INSTITUTE OF TECHNOLOGY SLOAN SCHOOL OF MANAGEMENT INFORMATION TECHNOLOGIES GROUP...

Date post: 14-Dec-2015
Category:
Upload: horatio-mcdowell
View: 214 times
Download: 1 times
Share this document with a friend
Popular Tags:
36
1 MASSACHUSETTS INSTITUTE OF TECHNOLOGY SLOAN SCHOOL OF MANAGEMENT INFORMATION TECHNOLOGIES GROUP SEMANTIC INTEGRATION: Context Interchange (COIN) Project Presentation & Demonstration Stuart Madnick ([email protected]) Michael Siegel ([email protected]) 2003-07-01
Transcript

1

MASSACHUSETTS INSTITUTE OF TECHNOLOGYSLOAN SCHOOL OF MANAGEMENT

INFORMATION TECHNOLOGIES GROUP

SEMANTIC INTEGRATION:Context Interchange (COIN) Project

Presentation & Demonstration

Stuart Madnick ([email protected]) Michael Siegel ([email protected])

2003-07-01

2

Agenda• Introduction to the Context Interchange

(COIN) Project – Motivation"[A]lthough there are many private and public databases that contain information potentially relevant to counterterrism programs, they lack the necessary context definitions (i.e., metadata) and access tools to enable interoperation with other databases and the extraction of meaningful and timely information."  National Research Council (2002), "Making the Nation Safer" (emphasis added)

• Description of the COIN Context Mediation Technology

• Demonstration of the COIN Technology

3

Data bases

Appli- cations

OUTPUT PROCESSING

ODBC Driver

Web - Publishing

CONTEXT MEDIATION* Automatic Automatic conflict conflict detection detection and and conversionconversion- Derived data

- Source selection

- Source attribution

TRUSTED

AGENTS

INPUT PROCESSING

* Automatic web wrapping

- - Semi-Semi-structured structured texttext

-Multi--Multi-source source query plan query plan and and executionexecution

Browsers APPLICATIONS: Financial services,

electronic commerce, asset visibility, in-transit visibility.

Sources

Web Pages

Receivers

COntext INterchange (COIN) Project

4

Key COIN Technologies Web Wrapper

Extract selected information from web (HTML+XML) Allows web to be treated as large relational SQL database Handles dynamic web sites, cookies, “login”, etc. Performs SQL Joins & Unions involving DB’s + Web sources

Context Mediator Resolve semantic (meaning) differences

Enable meaningful aggregation & comparison

5

Background on DARPA Supportfor Context Mediation Research

• Initial efforts funded as part of DARPA Intelligent Integration of Information (I3) Program

• Period: July 1993 - Sept 1998• Started under: Gio Wiederhold• then under: Dave Gunning & Bob Neches• I3 Program discontinued …

Other related activity:• MIT Total Data Quality Management (TDQM)• Since 1991 (web.mit.edu/tdqm)

6

Multiple Perspectives . . . old lady or young lady ?

7

CONTEXT VARIATIONS:- GEOGRAPHIC ( US vs. UK )

- FUNCTIONAL (CASH MGMT vs. LOANS )

- ORGANIZATIONAL ( CITIBANK vs. CHASE )

Context Context

Context

Data: Databases Web data E-mail

?$ £

¥

Role Of Context01-02-03

03-02-01

02-01-03

8

Types of Context

Representational Ontological

Temporal

Example Temporal

Representational Currency: $ vs € Scale factor: 1 vs 1000

Francs before 2000, € thereafter

Ontological Revenue: Includes vs excludes interest

Revenue: Excludes interest before 1994 but incl. thereafter

9

Example : Context Differences ( from multiple web

sources)

Daimler Benz ( DCX ) Financial Data

P/E Ratio

ABC 11.6

Bloomberg 5.57

DBC 19.19

MarketGuide 7.46

10

Complementary Aggregation Example• Q: How did CO2 emissions

(total, per GDP, per capita) change over time (between 1990 and 2000) in Yugoslavia?– Context 1: YUG as a

geographic region bounded before the breakup

– Context 2: YUG as a legal autonomous state

Related effort: - Laboratory for Information Globalization and Harmonization Technologies and Studies ( LIGHTS ) Project

11

1990 2000

Country

GDP Pop GDP Pop

YUG 698.3 23.7

1627.8

10.6

BIH 13.6 3.9

HRV 266.9 4.5

MKD 608.7 2.0

SVN 7162 2.0

Country Code Currency CurCode

Yugoslavia YUG New Yug. Dinar

YUN

Bosnia and Herzegovia

BIH Marka BAM

Croatia HRV Kuna HRK

Macedonia MKD Denar MKD

Slovenia SVN Tolar SIT

From

To 1990 2000

USD YUG

10.5 67.267

USD BIH 2.086

USD HRV

8.089

USD MKD

64.757

USD SVN

225.93

CO2 Emission

Country 1990 2000

YUG 35604 15480

BIH 1279

HRV 5405

MKD 3378

SVN 3981

Context 1 Context 2

Country 1990 2000 1990 2000

CO2 35604 29523 35604 15480

GDP 66.5 104.8 66.5 24.2

CO2/capita 1.5 1.28 1.5 1.46

CO2/GDP 535 282 535 640

GDP/Capita

2800 4560 2800 1100

GDP in billions local currency; GDP in billions local currency; Population in millionsPopulation in millions

In 1000 tons per yearIn 1000 tons per year

Total CO2 in 1000 tons per year; GDP in Total CO2 in 1000 tons per year; GDP in billions USD; CO2/Capita in tons per billions USD; CO2/Capita in tons per person; CO2/GDP in tons per million USD; person; CO2/GDP in tons per million USD; GDP/Capita in USD per personGDP/Capita in USD per person

World Bank’s World Dev. World Bank’s World Dev. Indicator DB; Indicator DB; UN UN Statistic Division; Statistic Division; Statistics BureausStatistics Bureaus

OAK Ridge’s CDIAC DB; OAK Ridge’s CDIAC DB; WRI; GSSD; EPAsWRI; GSSD; EPAs Olsen (Web)Olsen (Web)

Many sources needed:Meanings in sources & users might differ

12

The 1999 Overture

Unit-of-measure mixup tied to loss of $125Million Mars Orbiter

“NASA’s Mars Climate Orbiter was lost because engineers did not make a simple conversion from English units to metric, an embarrassing lapse that sent the $125 million craft off course. . . .

. . . The navigators ( JPL ) assumed metric units of force per second, or newtons. In fact, the numbers were in pounds of force per second as supplied by Lockheed Martin ( the contractor ).”

Source: Kathy Sawyer, Boston Globe, October 1, 1999, page 1.

13

The Context Interchange Approach

ContextMediator

Source Receiver

ReceiverContext

ConversionLibraries

SourceContext

SharedOntologies

ContextTransformation

Context ManagementApplication

Concept: Length

Meters Feet f()meters feet

17

part length

Select partlengthFrom catalogWhere partno=“12AY”

14

COIN Conceptual Model

(Ontology)

15

Another Context Example (Basis for Demo)

Company Name

Company Name

Net Income

Net Income

Sales

Sales

DAIMLER-BENZ AG

346,577

56,268,168

615,000,000

97,737,000,000

O&A DEM-USD Exchange Rate1.00 German Mark= 0.58 US Dollar as 12/31/93

WorldScope

Disclosure

OANDAWeb Server

Context Mediation Services

Users & Appl.

Systems

Net IncomeCompany Name

Sales

DAIMLER-BENZ

614,99597,736,992

Datastream

Wrapper Services

*

*

*

*

*

DAIMLER BENZ CORP

16

Some Context DifferencesContext Definitions

Disclosure Worldscope DataStream Currency Used

Country of Incorporation

USD Country of Incorporation

Currency Conversion

Money Amount As_Of_Date

Money Amount As_Of_Date

Money Amount As_Of_Date

Currency Symbols

3 Letters 3 Letters 2 Letters

Scale Factor 1 1000 1000 Company Names

Disclosure Names Worldscope Names DataStream Names

Date Style American with ‘/’ as separator

American with ‘/’ as separator

European with ‘-’ as separator

Olsen (OANDA) Web Source uses 3 Letter Currency Symbols and European Date Style with ‘/’ as a separator

17

Domain Modelnumber exchange-

Ratestring

currency-Type

from

Cur

toCur

company-Financials

scal

eFac

tor

date

country-Name

curT

ypeSym

company-Name

curr

ency

fyEnding

company

coun

tryI

ncor

p

form

at

date

FmttxnDate

officialCurrency

InheritanceAttributeModifier

Some currency context possibilities:• Currency is stated explicitly as part of record• Currency not stated, but the same for all (e.g., US $)• Currency not stated or constant, but inferred by country

18

HT

TPD

-Daem

on

HT

TPD

-Daem

on

HT

TPD

-Daem

on

Web-site

Wrapper

WWW Gateway

SERVER PROCESSES MEDIATOR PROCESSES CLIENT PROCESSES

COINRepository

ContextMediator

Optimizer

Executioner

Data Store for IntermediateResults

SQL Compiler

DatalogQuery

MediatedQuery

Optimized Query Plan

N

N

HT

TPD

-Daem

on

ODBC-compliant Apps

(e.g Microsoft Excel)

ODBC-Driver

Web Client

(cgi-scripts)

Results

SQL Query

SQL

Query

COIN System Architecture

19

System Demonstration

Q6. Scenario: Using Context Interchange, you can look at the Disclosure data using Datastream Context.

Query: Find out from Disclosure what Net Income for DAIMLER-BENZ was. Use Datastream Context.

Capabilities Demonstrated:

Ability to perform Scale Factor Conversion, Date Format Conversion, Company Name Conversion.

Single Source Queries with MediationSingle Source Queries with Mediation

20

Demonstration @ context2.mit.edu

Context

Source

21

Context Metadata (Partial)

22

Conflict Detection and Mediation

Date convertScale factor convertName convert

Mediated Query in Datalog

23

Mediated SQL Query & Result

Adjust scale factor

Date format conversion

Name conversion

Final results – from Disclosure but in Datastream context

Mediated SQL Query

24

More Complex Example (4 sources: DB + Web)

select WorldcAF.TOTAL_ASSETS, DiscAF.NET_SALES, DiscAF.NET_INCOME, DStreamAF.TOTAL_EXTRAORD_ITEMS_PRE_TAX, quotes.Lastfrom WorldcAF, DiscAF, DStreamAF, quotes where WorldcAF.COMPANY_NAME = "DAIMLER-BENZ AG"and DStreamAF.AS_OF_DATE = "01/05/94" and WorldcAF.COMPANY_NAME = DStreamAF.NAME and WorldcAF.COMPANY_NAME = DiscAF.COMPANY_NAME and WorldcAF.COMPANY_NAME = quotes.Cname;

Databases Web source

25

Conflict Table (1st part)

26

Conflict Table (2nd part)

27

Generated SQL (1st Part)select worldcaf.total_assets, discaf.net_sales, ((discaf.net_income*0.001)*olsen.rate), (dstreamaf2.total_extraord_items_pre_tax*olsen2.rate), quotes.Lastfrom (select date1, 'European Style -', '01/05/94', 'American Style /' from datexform where format1='European Style -' and date2='01/05/94' and format2='American Style /') datexform, (select dt_names, 'DAIMLER-BENZ AG' from name_map_dt_ws where ws_names='DAIMLER-BENZ AG') name_map_dt_ws, (select ds_names, 'DAIMLER-BENZ AG' from name_map_ds_ws where ws_names='DAIMLER-BENZ AG') name_map_ds_ws, (select 'DAIMLER-BENZ AG', ticker, exc from ticker_lookup2 where comp_name='DAIMLER-BENZ AG') ticker_lookup2, (select 'DAIMLER-BENZ AG', latest_annual_financial_date, current_outstanding_shares, net_income, sales, total_assets, country_of_incorp from worldcaf where company_name='DAIMLER-BENZ AG') worldcaf, (select country, currency from currencytypes where currency <> 'USD') currencytypes, (select exchanged, 'USD', rate, date from olsen where expressed='USD') olsen, (select company_name, latest_annual_data, current_shares_outstanding, net_income, net_sales, total_assets, location_of_incorp from discaf) discaf,

28

Generated SQL (Continued - Partial) (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf, (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf2, (select char3_currency, char2_currency from currency_map where char3_currency <> 'USD') currency_map, (select country, currency from currencytypes where currency <> 'USD') currencytypes2, (select exchanged, 'USD', rate, '01/05/94' from olsen where expressed='USD' and date='01/05/94') olsen2, (select Cname, Last from quotes) quoteswhere currencytypes.country = discaf.location_of_incorpand currencytypes.currency = olsen.exchangedand dstreamaf.currency = dstreamaf2.currencyand dstreamaf2.currency = currency_map.char2_currencyand olsen.date = discaf.latest_annual_dataand currency_map.char3_currency = currencytypes2.currencyand currencytypes2.currency = olsen2.exchangedand name_map_dt_ws.dt_names = dstreamaf2.nameand name_map_ds_ws.ds_names = discaf.company_nameand ticker_lookup2.ticker = quotes.Cnameand datexform.date1 = dstreamaf2.as_of_dateand currencytypes.currency <> 'USD'and currency_map.char3_currency <> 'USD'unionselect worldcaf2.total_assets, discaf2.net_sales, ((discaf2.net_income*0.001)*olsen3.rate), dstreamaf4.total_extraord_items_pre_tax, quotes2.Last

from (select date1, 'European Style -', '01/05/94', 'American Style /' from datexform where format1='European Style -' and date2='01/05/94' and format2='American Style /') datexform2, (select dt_names, 'DAIMLER-BENZ AG' from name_map_dt_ws where ws_names='DAIMLER-BENZ AG') name_map_dt_ws2, (select ds_names, 'DAIMLER-BENZ AG' from name_map_ds_ws where ws_names='DAIMLER-BENZ AG') name_map_ds_ws2, (select 'DAIMLER-BENZ AG', ticker, exc from ticker_lookup2 where comp_name='DAIMLER-BENZ AG') ticker_lookup22, (select 'DAIMLER-BENZ AG', latest_annual_financial_date, current_outstanding_shares, net_income, sales, total_assets, country_of_incorp from worldcaf where company_name='DAIMLER-BENZ AG') worldcaf2, (select country, currency from currencytypes where currency <> 'USD') currencytypes3, (select exchanged, 'USD', rate, date from olsen where expressed='USD') olsen3, (select company_name, latest_annual_data, current_shares_outstanding, net_income, net_sales, total_assets, location_of_incorp from discaf) discaf2, (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf3, (select 'USD', char2_currency from currency_map where char3_currency='USD') currency_map2,

etc

29

Final Result

30

Execution Trace (1st Part - Partials)

. . .

Parallel Execution

Retrieving dataFrom Web source

31

Execution Trace (Continued - Partials). . .

. . .

Another Web source used(for currency conversion)

Stock price returnedFrom Web source

32

The 1805 Overture

In 1805, the Austrian and Russian Emperors agreed to join forces against Napoleon. The Russians promised that their forces would be in the field in Bavaria by Oct. 20.

The Austrian staff planned its campaign based on that date in the Gregorian calendar. Russia, however, still used the ancient Julian calendar, which lagged 10 days behind.

The calendar difference allowed Napoleon to surround Austrian General Mack's army at Ulm and force its surrender on Oct. 21, well before the Russian forces could reach him, ultimately setting the stage for Austerlitz.

Source: David Chandler, The Campaigns of Napoleon, New York: MacMillan 1966, pg. 390.

33

Summary• Tremendous opportunity to gather and integrate

information from many diverse sources

• But … need to overcome many context challenges

• Context-type “metadata” plays a critical role

• COIN technology can be an important aid for semantically meaningful information integration:

- Scalable- Extensible

- Application Domain Merging- Reuse and extension of ontologies and contexts

34

Appendix – Some Useful Reference Material• Documents

Overview: http://computer.org/conferen/proceed/meta/1999/papers/84/smadnick.html"Metadata Jones and the Tower of Babel: The Challenge of Large-Scale Semantic Heterogeneity", (IEEE Meta-Data Conference)

Theory of COIN: http://web.mit.edu/smadnick/www/wp/1997-03.pdf“Context Interchange: New Features and Formalisms for the Intelligent Integration of Information” (ACM TOIS)

Contact us for more …

• Web sitesMain COIN web site: http://context2.mit.eduMiscellaneous demos: http://context2.mit.edu/coin/demos/Self-explanatory demo: http://interchange.mit.edu:8080/gcms_v4/airCarMergeTop.html

(Airfare and Car rental applications, includes ontology merging.Caution: still under development)

35

Appendix: Sample Applications

• Airfare, Car Rental and Merged Travel • Weather • Global Price Comparison • Airfare Aggregation • Disaster Relief • TASC Financial Example • Web Services Demo • Corporate Householding

36

Web page spec file *

Appendix: COIN Web-Wrapper Technology

Select Edgar.Net_incomeFrom EdgarWhere Edgar.Ticker=intcand Edgar.Form=10-Q

Ticker Net IncomeINTC 1,983

User or Program (via SQL Query)

Web Wrapper Generat

or

Data record returned

* Spec file contains:Schema, Navigation rules,and Extraction rules.

SQLSide

HTMLSide


Recommended