Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | horatio-mcdowell |
View: | 214 times |
Download: | 1 times |
1
MASSACHUSETTS INSTITUTE OF TECHNOLOGYSLOAN SCHOOL OF MANAGEMENT
INFORMATION TECHNOLOGIES GROUP
SEMANTIC INTEGRATION:Context Interchange (COIN) Project
Presentation & Demonstration
Stuart Madnick ([email protected]) Michael Siegel ([email protected])
2003-07-01
2
Agenda• Introduction to the Context Interchange
(COIN) Project – Motivation"[A]lthough there are many private and public databases that contain information potentially relevant to counterterrism programs, they lack the necessary context definitions (i.e., metadata) and access tools to enable interoperation with other databases and the extraction of meaningful and timely information." National Research Council (2002), "Making the Nation Safer" (emphasis added)
• Description of the COIN Context Mediation Technology
• Demonstration of the COIN Technology
3
Data bases
Appli- cations
OUTPUT PROCESSING
ODBC Driver
Web - Publishing
CONTEXT MEDIATION* Automatic Automatic conflict conflict detection detection and and conversionconversion- Derived data
- Source selection
- Source attribution
TRUSTED
AGENTS
INPUT PROCESSING
* Automatic web wrapping
- - Semi-Semi-structured structured texttext
-Multi--Multi-source source query plan query plan and and executionexecution
Browsers APPLICATIONS: Financial services,
electronic commerce, asset visibility, in-transit visibility.
Sources
Web Pages
Receivers
COntext INterchange (COIN) Project
4
Key COIN Technologies Web Wrapper
Extract selected information from web (HTML+XML) Allows web to be treated as large relational SQL database Handles dynamic web sites, cookies, “login”, etc. Performs SQL Joins & Unions involving DB’s + Web sources
Context Mediator Resolve semantic (meaning) differences
Enable meaningful aggregation & comparison
5
Background on DARPA Supportfor Context Mediation Research
• Initial efforts funded as part of DARPA Intelligent Integration of Information (I3) Program
• Period: July 1993 - Sept 1998• Started under: Gio Wiederhold• then under: Dave Gunning & Bob Neches• I3 Program discontinued …
Other related activity:• MIT Total Data Quality Management (TDQM)• Since 1991 (web.mit.edu/tdqm)
7
CONTEXT VARIATIONS:- GEOGRAPHIC ( US vs. UK )
- FUNCTIONAL (CASH MGMT vs. LOANS )
- ORGANIZATIONAL ( CITIBANK vs. CHASE )
Context Context
Context
Data: Databases Web data E-mail
?$ £
¥
Role Of Context01-02-03
03-02-01
02-01-03
8
Types of Context
Representational Ontological
Temporal
Example Temporal
Representational Currency: $ vs € Scale factor: 1 vs 1000
Francs before 2000, € thereafter
Ontological Revenue: Includes vs excludes interest
Revenue: Excludes interest before 1994 but incl. thereafter
9
Example : Context Differences ( from multiple web
sources)
Daimler Benz ( DCX ) Financial Data
P/E Ratio
ABC 11.6
Bloomberg 5.57
DBC 19.19
MarketGuide 7.46
10
Complementary Aggregation Example• Q: How did CO2 emissions
(total, per GDP, per capita) change over time (between 1990 and 2000) in Yugoslavia?– Context 1: YUG as a
geographic region bounded before the breakup
– Context 2: YUG as a legal autonomous state
Related effort: - Laboratory for Information Globalization and Harmonization Technologies and Studies ( LIGHTS ) Project
11
1990 2000
Country
GDP Pop GDP Pop
YUG 698.3 23.7
1627.8
10.6
BIH 13.6 3.9
HRV 266.9 4.5
MKD 608.7 2.0
SVN 7162 2.0
Country Code Currency CurCode
Yugoslavia YUG New Yug. Dinar
YUN
Bosnia and Herzegovia
BIH Marka BAM
Croatia HRV Kuna HRK
Macedonia MKD Denar MKD
Slovenia SVN Tolar SIT
From
To 1990 2000
USD YUG
10.5 67.267
USD BIH 2.086
USD HRV
8.089
USD MKD
64.757
USD SVN
225.93
CO2 Emission
Country 1990 2000
YUG 35604 15480
BIH 1279
HRV 5405
MKD 3378
SVN 3981
Context 1 Context 2
Country 1990 2000 1990 2000
CO2 35604 29523 35604 15480
GDP 66.5 104.8 66.5 24.2
CO2/capita 1.5 1.28 1.5 1.46
CO2/GDP 535 282 535 640
GDP/Capita
2800 4560 2800 1100
GDP in billions local currency; GDP in billions local currency; Population in millionsPopulation in millions
In 1000 tons per yearIn 1000 tons per year
Total CO2 in 1000 tons per year; GDP in Total CO2 in 1000 tons per year; GDP in billions USD; CO2/Capita in tons per billions USD; CO2/Capita in tons per person; CO2/GDP in tons per million USD; person; CO2/GDP in tons per million USD; GDP/Capita in USD per personGDP/Capita in USD per person
World Bank’s World Dev. World Bank’s World Dev. Indicator DB; Indicator DB; UN UN Statistic Division; Statistic Division; Statistics BureausStatistics Bureaus
OAK Ridge’s CDIAC DB; OAK Ridge’s CDIAC DB; WRI; GSSD; EPAsWRI; GSSD; EPAs Olsen (Web)Olsen (Web)
Many sources needed:Meanings in sources & users might differ
12
The 1999 Overture
Unit-of-measure mixup tied to loss of $125Million Mars Orbiter
“NASA’s Mars Climate Orbiter was lost because engineers did not make a simple conversion from English units to metric, an embarrassing lapse that sent the $125 million craft off course. . . .
. . . The navigators ( JPL ) assumed metric units of force per second, or newtons. In fact, the numbers were in pounds of force per second as supplied by Lockheed Martin ( the contractor ).”
Source: Kathy Sawyer, Boston Globe, October 1, 1999, page 1.
13
The Context Interchange Approach
ContextMediator
Source Receiver
ReceiverContext
ConversionLibraries
SourceContext
SharedOntologies
ContextTransformation
Context ManagementApplication
Concept: Length
Meters Feet f()meters feet
17
part length
Select partlengthFrom catalogWhere partno=“12AY”
15
Another Context Example (Basis for Demo)
Company Name
Company Name
Net Income
Net Income
Sales
Sales
DAIMLER-BENZ AG
346,577
56,268,168
615,000,000
97,737,000,000
O&A DEM-USD Exchange Rate1.00 German Mark= 0.58 US Dollar as 12/31/93
WorldScope
Disclosure
OANDAWeb Server
Context Mediation Services
Users & Appl.
Systems
Net IncomeCompany Name
Sales
DAIMLER-BENZ
614,99597,736,992
Datastream
Wrapper Services
*
*
*
*
*
DAIMLER BENZ CORP
16
Some Context DifferencesContext Definitions
Disclosure Worldscope DataStream Currency Used
Country of Incorporation
USD Country of Incorporation
Currency Conversion
Money Amount As_Of_Date
Money Amount As_Of_Date
Money Amount As_Of_Date
Currency Symbols
3 Letters 3 Letters 2 Letters
Scale Factor 1 1000 1000 Company Names
Disclosure Names Worldscope Names DataStream Names
Date Style American with ‘/’ as separator
American with ‘/’ as separator
European with ‘-’ as separator
Olsen (OANDA) Web Source uses 3 Letter Currency Symbols and European Date Style with ‘/’ as a separator
17
Domain Modelnumber exchange-
Ratestring
currency-Type
from
Cur
toCur
company-Financials
scal
eFac
tor
date
country-Name
curT
ypeSym
company-Name
curr
ency
fyEnding
company
coun
tryI
ncor
p
form
at
date
FmttxnDate
officialCurrency
InheritanceAttributeModifier
Some currency context possibilities:• Currency is stated explicitly as part of record• Currency not stated, but the same for all (e.g., US $)• Currency not stated or constant, but inferred by country
18
HT
TPD
-Daem
on
HT
TPD
-Daem
on
HT
TPD
-Daem
on
Web-site
Wrapper
WWW Gateway
SERVER PROCESSES MEDIATOR PROCESSES CLIENT PROCESSES
COINRepository
ContextMediator
Optimizer
Executioner
Data Store for IntermediateResults
SQL Compiler
DatalogQuery
MediatedQuery
Optimized Query Plan
N
N
HT
TPD
-Daem
on
ODBC-compliant Apps
(e.g Microsoft Excel)
ODBC-Driver
Web Client
(cgi-scripts)
Results
SQL Query
SQL
Query
COIN System Architecture
19
System Demonstration
Q6. Scenario: Using Context Interchange, you can look at the Disclosure data using Datastream Context.
Query: Find out from Disclosure what Net Income for DAIMLER-BENZ was. Use Datastream Context.
Capabilities Demonstrated:
Ability to perform Scale Factor Conversion, Date Format Conversion, Company Name Conversion.
Single Source Queries with MediationSingle Source Queries with Mediation
22
Conflict Detection and Mediation
Date convertScale factor convertName convert
Mediated Query in Datalog
23
Mediated SQL Query & Result
Adjust scale factor
Date format conversion
Name conversion
Final results – from Disclosure but in Datastream context
Mediated SQL Query
24
More Complex Example (4 sources: DB + Web)
select WorldcAF.TOTAL_ASSETS, DiscAF.NET_SALES, DiscAF.NET_INCOME, DStreamAF.TOTAL_EXTRAORD_ITEMS_PRE_TAX, quotes.Lastfrom WorldcAF, DiscAF, DStreamAF, quotes where WorldcAF.COMPANY_NAME = "DAIMLER-BENZ AG"and DStreamAF.AS_OF_DATE = "01/05/94" and WorldcAF.COMPANY_NAME = DStreamAF.NAME and WorldcAF.COMPANY_NAME = DiscAF.COMPANY_NAME and WorldcAF.COMPANY_NAME = quotes.Cname;
Databases Web source
27
Generated SQL (1st Part)select worldcaf.total_assets, discaf.net_sales, ((discaf.net_income*0.001)*olsen.rate), (dstreamaf2.total_extraord_items_pre_tax*olsen2.rate), quotes.Lastfrom (select date1, 'European Style -', '01/05/94', 'American Style /' from datexform where format1='European Style -' and date2='01/05/94' and format2='American Style /') datexform, (select dt_names, 'DAIMLER-BENZ AG' from name_map_dt_ws where ws_names='DAIMLER-BENZ AG') name_map_dt_ws, (select ds_names, 'DAIMLER-BENZ AG' from name_map_ds_ws where ws_names='DAIMLER-BENZ AG') name_map_ds_ws, (select 'DAIMLER-BENZ AG', ticker, exc from ticker_lookup2 where comp_name='DAIMLER-BENZ AG') ticker_lookup2, (select 'DAIMLER-BENZ AG', latest_annual_financial_date, current_outstanding_shares, net_income, sales, total_assets, country_of_incorp from worldcaf where company_name='DAIMLER-BENZ AG') worldcaf, (select country, currency from currencytypes where currency <> 'USD') currencytypes, (select exchanged, 'USD', rate, date from olsen where expressed='USD') olsen, (select company_name, latest_annual_data, current_shares_outstanding, net_income, net_sales, total_assets, location_of_incorp from discaf) discaf,
28
Generated SQL (Continued - Partial) (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf, (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf2, (select char3_currency, char2_currency from currency_map where char3_currency <> 'USD') currency_map, (select country, currency from currencytypes where currency <> 'USD') currencytypes2, (select exchanged, 'USD', rate, '01/05/94' from olsen where expressed='USD' and date='01/05/94') olsen2, (select Cname, Last from quotes) quoteswhere currencytypes.country = discaf.location_of_incorpand currencytypes.currency = olsen.exchangedand dstreamaf.currency = dstreamaf2.currencyand dstreamaf2.currency = currency_map.char2_currencyand olsen.date = discaf.latest_annual_dataand currency_map.char3_currency = currencytypes2.currencyand currencytypes2.currency = olsen2.exchangedand name_map_dt_ws.dt_names = dstreamaf2.nameand name_map_ds_ws.ds_names = discaf.company_nameand ticker_lookup2.ticker = quotes.Cnameand datexform.date1 = dstreamaf2.as_of_dateand currencytypes.currency <> 'USD'and currency_map.char3_currency <> 'USD'unionselect worldcaf2.total_assets, discaf2.net_sales, ((discaf2.net_income*0.001)*olsen3.rate), dstreamaf4.total_extraord_items_pre_tax, quotes2.Last
from (select date1, 'European Style -', '01/05/94', 'American Style /' from datexform where format1='European Style -' and date2='01/05/94' and format2='American Style /') datexform2, (select dt_names, 'DAIMLER-BENZ AG' from name_map_dt_ws where ws_names='DAIMLER-BENZ AG') name_map_dt_ws2, (select ds_names, 'DAIMLER-BENZ AG' from name_map_ds_ws where ws_names='DAIMLER-BENZ AG') name_map_ds_ws2, (select 'DAIMLER-BENZ AG', ticker, exc from ticker_lookup2 where comp_name='DAIMLER-BENZ AG') ticker_lookup22, (select 'DAIMLER-BENZ AG', latest_annual_financial_date, current_outstanding_shares, net_income, sales, total_assets, country_of_incorp from worldcaf where company_name='DAIMLER-BENZ AG') worldcaf2, (select country, currency from currencytypes where currency <> 'USD') currencytypes3, (select exchanged, 'USD', rate, date from olsen where expressed='USD') olsen3, (select company_name, latest_annual_data, current_shares_outstanding, net_income, net_sales, total_assets, location_of_incorp from discaf) discaf2, (select as_of_date, name, total_sales, total_extraord_items_pre_tax, earned_for_ordinary, currency from dstreamaf) dstreamaf3, (select 'USD', char2_currency from currency_map where char3_currency='USD') currency_map2,
etc
31
Execution Trace (Continued - Partials). . .
. . .
Another Web source used(for currency conversion)
Stock price returnedFrom Web source
32
The 1805 Overture
In 1805, the Austrian and Russian Emperors agreed to join forces against Napoleon. The Russians promised that their forces would be in the field in Bavaria by Oct. 20.
The Austrian staff planned its campaign based on that date in the Gregorian calendar. Russia, however, still used the ancient Julian calendar, which lagged 10 days behind.
The calendar difference allowed Napoleon to surround Austrian General Mack's army at Ulm and force its surrender on Oct. 21, well before the Russian forces could reach him, ultimately setting the stage for Austerlitz.
Source: David Chandler, The Campaigns of Napoleon, New York: MacMillan 1966, pg. 390.
33
Summary• Tremendous opportunity to gather and integrate
information from many diverse sources
• But … need to overcome many context challenges
• Context-type “metadata” plays a critical role
• COIN technology can be an important aid for semantically meaningful information integration:
- Scalable- Extensible
- Application Domain Merging- Reuse and extension of ontologies and contexts
34
Appendix – Some Useful Reference Material• Documents
Overview: http://computer.org/conferen/proceed/meta/1999/papers/84/smadnick.html"Metadata Jones and the Tower of Babel: The Challenge of Large-Scale Semantic Heterogeneity", (IEEE Meta-Data Conference)
Theory of COIN: http://web.mit.edu/smadnick/www/wp/1997-03.pdf“Context Interchange: New Features and Formalisms for the Intelligent Integration of Information” (ACM TOIS)
Contact us for more …
• Web sitesMain COIN web site: http://context2.mit.eduMiscellaneous demos: http://context2.mit.edu/coin/demos/Self-explanatory demo: http://interchange.mit.edu:8080/gcms_v4/airCarMergeTop.html
(Airfare and Car rental applications, includes ontology merging.Caution: still under development)
35
Appendix: Sample Applications
• Airfare, Car Rental and Merged Travel • Weather • Global Price Comparison • Airfare Aggregation • Disaster Relief • TASC Financial Example • Web Services Demo • Corporate Householding
36
Web page spec file *
Appendix: COIN Web-Wrapper Technology
Select Edgar.Net_incomeFrom EdgarWhere Edgar.Ticker=intcand Edgar.Form=10-Q
Ticker Net IncomeINTC 1,983
User or Program (via SQL Query)
Web Wrapper Generat
or
Data record returned
* Spec file contains:Schema, Navigation rules,and Extraction rules.
SQLSide
HTMLSide