Post on 20-Dec-2015
transcript
1
CoBase: Scalable and Extensible Cooperative Information System
Wesley W. ChuComputer Science Department
University of California, Los Angeles
http://www.cobase.cs.ucla.edu
2
Conventional Query Answering
Need to know the detailed database schemaCannot get approximate answersCannot answer conceptual queries
Cooperative Query AnsweringDerive approximate AnswersAnswer Conceptual Queries
3
Find a seaport with railway facility in Los Angeles
CoBase ServersHeterogeneousInformation Sources
CoBase provides: Relaxation Approximation Association Explanation
Find a nearby friendly airport that can land F-15
Domain Knowledge
Find hospitals with facility similar to St. John’s near LAX
Cooperative Queries
4
Generalization and Specialization
More Conceptual Query
Specific Query
Conceptual Query Conceptual Query
Specific Query
Generalization
SpecializationGeneralization
Specialization
5
Type Abstraction Hierarchy (TAH)
Chemical-Suit Size TAH(A non-numerical TAH) All_Sizes
Large_SizeSmall_Size
Very_Small
Small_to_Medium
Large_to_Extra_Large
Very_Large
XL XXLLMSXXSXXXS
Provide multi-level knowledge representations
6
Type Abstraction Hierarchy (TAH)
CA
N. CAS. CA C. CA
SanJose
PaloAltoSacramento
DavisSanDiego
LongBeach
LA SF
(Location Example)
7
Relaxation Agent
query conditionsconstraints
Use knowledge-based approach (generalization
and specialization via Type Abstraction Hierarchy)
to relax the followings for matching:
8
Query Relaxation
Yes
Query
Display
AnswersRelaxAttribute Database
No
QueryModificationTAHs
9
10
Visualization of Relaxation Process
Query: Find seaports in the given region.
given region
relaxed region
11
12
Relaxation Control Primitives
not-relaxable runway-length
relaxation-order (runway length,
location)
preference-listunacceptable-listanswer-sizerelaxation-level
13
Relaxation Primitives
^ (approximate) ^ 9 am
betweennear-to (context-sensitive) Airport near-to
LAX Restaurant near-to
UCLA
similar-to Airport similar-
to LAX base-on (traffic,runway)
within
14
Similar-to
Find all airports in Tunisia similar to the Bizerte airport based on runway length and (more importantly) runway width.
select aport_name, runway_length, runway_widthfrom runways, countrieswhere aport_name similar-to ‘Bizerte’
based-on ((runway_length 1.0) (runway_width 2.0)) and country_state_name = ‘Tunisia’ and countries.glc_cd = runways.glc_cd
15
Similar-to Result
APROT_NM LENGTH WIDTH RANKBezerte 8000 148 0.00El Borma 7200 144 0.09Monastir 9700 137 0.20Jerba 10171 148 0.24Bjedeida 6000 122 0.27
Similar-to module ranks the returned answersaccording to mean-squared error.
16
Unacceptable List Operator
NETunisia
CentralTunisia
NWTunisia
SWTunisia
Tunisia
Bizerte El Borma...
CentralTunisia
SWTunisia
Tunisia
Gafsa El Borma
Type Abstraction Hierarchy Trimmed TAH
Avoid Northern Tunisia!
CoBaseRelaxationManager
Constraint
Gafsa
17
TAH Generation for Numerical Attribute Values
Relaxation Error Difference between the exact value and the
returned approximate value The expected error is weighted by the
probability of occurrence of each value
DISC (Distribution Sensitive Clustering) is based on the attribute values and frequency distribution of the data
18
TAH Generation for Non-numerical Attribute Values
Pattern Based Knowledge Induction (PBKI)
Rule-based approachClusters attribute values into TAH based on other attributes in the relation (i.e., Inter-Attributes Relationships)Provides attribute correlation value (measure how well the rules applied to the databases)
19
Type Abstraction Hierarchy (TAH)
Location Name Runway Length
All
Short Medium Long
0 ... 700 700 ... 1K 1K ... 5K
Tunisia
NE Tunisia
Bizerte
Tunis
Djedeida
CentralTunisia
SW Tunisia
El Borma
...
Provide multi-level knowledge representations
20
Associative Query Answering
Provide relevant information not explicitly asked by the userUser Query: List all airports with runway length between 8500
and approximately 10000 feet
Airport Name Runway Length (feet)Jerba 10171
Monastir 9700Tunis 10500
Weather Runway QualitySunny GoodRain Good
Foggy Damaged
Military or Civilian Flag
Refrigerated Storage Capacity (Tons)
CC 0.00C 1000.00
Query Answers
Associated Attributes and Answers Associated Attributes and Answers
User Type = Pilot User Type = Planner
21
CoBase and GLADIntegration
Wesley W. Chu
22
CoBase FunctionalityProvide approximate matching Find HETs with capacity of approximate 5-ton
Provide conceptual query answering Find “Earth Moving” Equipment
Provide content-sensitive spatial queries Find storage sites near selected location (Integration with MATT map server)
Provide relaxation control Relaxation order Not-relaxable At-least (answer set, quantity on hand)
23
Cooperative Operations Added to GLADImplicit Query RelaxationExplicit Query Relaxation Approximate operator Similar-to/based-on Spatial relaxation
Relaxation Control Relaxation-order Not-relaxable At-least (answer-set size, quantity on hand)
24
CoBase Features Added to GLADEnhance GLAD queries with cooperative operators (similar-to, relaxation-order, etc.)Display the query relaxation process modified query conditions (value, spatial) type abstraction hierarchies
Rank returned answers with similarity measurese.g., spatial relaxation ranks answers according to
their distance from the selected location
25
CoBase and GLAD TIE
ReportCollection
Report QueryConstructor
Filter
Editor
ObjectCache
DisplayGenerator
QueryCollection
GLAD
CoBase QueryEditor
CoBaseRelaxationManager
KnowledgeBase
DataCacheCoBase
Data Source
Manager
Databases
NSNs
SpatialArea
Selection
26
GLAD Query
Find NSNs of aircraft with passenger capacity > 10, combat type = 'I', capacity weight <= 2 tons and price < 700,000. select nsn, price, pax_capacity_qty, capacity_wt_stonfrom nsn_descriptionwhere (upper(class) = '7'
and upper(cbs_category_nomen) = 'AIRCRAFT'
and price < 700000and pax_capacity_qty > 10and upper (combat_type) = 'I'and capacity_wt_ston <= 2)
27
CoGLAD Query with Relaxation Control Operators
Find NSNs of aircrafts with passenger capacity > 10, combat type = 'I',capacity weight <= 2 tons and price < 700,000. Attribute passengercapacity is not relaxable. Relax price first and then capacity weight. select nsn, price, pax_capacity_qty, capacity_wt_stonfrom nsn_descriptionwhere (upper(class) = '7'
and upper(cbs_category_nomen) = 'AIRCRAFT'and price < 700000and pax_capacity_qty > 10and upper (combat_type) = 'I'and capacity_wt_ston <= 2)
not-relaxable pax_capacity_qtyrelaxation-order price capacity_wt_ston
28
CoGLAD Querywith Similar-to OperatorFind aircraft similar to NSN = '0000IB0000961' based on the attributes price, passenger capacity and air mileage. Passenger capacity has a weight of 8 and price and air mileage has a weight of 1.
select nsnfrom nsn_descriptionwhere upper(nsn) similar-to '0000IB0000961'
based-on ((price 1.0) (pax_capacity_qty 8.0) (air_mileage 1.0))
at-least 4
* '0000IB0000961' is an answer from the previous query
29
CoGLAD Querywith Approximate Operator
Find DLA stock report with NSN like ‘%8340% (FSC for tents and tarpaulin) and on-hand quantity is approximate 150.
select nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and
on_hand_quantity = ~150
30
Adding Constraints to a Query
GLAD queryselect nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and
nomenclature like ‘%TARP%’
Query with added constraintsselect nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and
nomenclature like ‘%TARP%’ and on_hand_quantity = ~150
andsize_in_square_feet = 350
31
Example of Spatial Relaxation
NSNsselected an area on the mapconstraint: quantity on hand
CoBaseRelaxationManager
satisfyconstraints
Yes
No
return the answers
QueryProcessing
relax the selected areabased on the context-sensitive TAHs
32
Spatial Relaxation with Relaxation Controlrelaxation-order: size, (latitude, longitude)
not-relaxable: price
at-least: value: size of the tarpaulin quantity on hand: relax until enough
quantity on hand (specified by the user) is obtained
33
Scalable and Extensible CoBase Architecture
34
Mediator Inter-Communications via KQML
ModuleObjects
APIs
Content LanguageDataActions
CoBaseOntology
Mediator A
Module A
CoBase Ontology
CoBase Content Language
KQML
Mediator B
Module B
CoBase Ontology
CoBase Content Language
KQML
35
36
Query Answers Without CoBase
Query: find chemical suits
37
38
39
40
41
42
43
Electronic Warfare
Identify and locate sources of radiated electromagnetic energyDetermine emitter type based on the operating parameters of observed signals: Radio Frequency (RF) Pulse Repetition Frequency (PRF) Pulse Duration (PD) Scan Period (SP) other operating parameters
Determine platform sites near the line of the bearing of an emitter
This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ
Frew, et al.), Camden, NJ
44
Performance Improvement by Using CoBase in EW
Conventional DB CoBaseCase 1 Case 2 Case 1 Case 2
identified 90.00% 30.00% 100.00% 85.90%id/ranking 100.00% 36.00% 100.00% 98.80%relaxation 0.00% 0.00% 95.90% 99.80%
Conventional DB: parameter ranges from emitter specificationsCoBase:
DB: peak parameters (RF,PRF) and parameter ranges (PD,SP)KB: TAHs based on RF and PRF peak parameters
TAHs based on PD and SP parameter rangesCase 1: emitter signals without noiseCase 2: add noise - PD & SP (10%), PRF (5%), RF (2.5%)Sample Size: 1000 signals Emitter Types: 75
This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ
Frew, et al.), Camden, NJ
45
Current CoBase Users and Applications
ARPI members ISI Unisys
Enchance Query Capabilities in TransportationDomain (ARPI TARGET): query relaxation, association, and explanation
UCLA KMeD Project Medical School
Improve Search in Medical Images (X-rays, MRs) approximate matching of image features and
contents explanation of approximate matching quality
Hughes Research Lab Integrate Schema in Heterogeneous Databases approximate matching of attributes and views
Lockheed/Martin Marietta
Emitter and Platform Identification approximate matching of observed emitter signals relaxation of regions to identify emitter platforms
BBN Enchance DOD Logistic Anchor Desk (GLAD) query relaxation and spatial relaxation
46
Conclusions
Provide user and context sensitive query relaxations (structured and unstructured data)Provide additional information (associative query answering) based on past casesCoSQL (Cooperative SQL) similar-to, near-to, approximate relaxation control operators
GUI map server, high-level query formation
47
48
CoSent: An Active Data Base Technology
Natural language-like rule supports conceptual & approximate terms Decompose natural language-like rule to low level rules via knowledge based (TAH) Mimic human cognitive process and thus ease in rule specificationEase in rule maintenance
49
CoSent: An Active Database Technologies
Trigger with high-level rules containing conceptual term (e.g., bad, heavy) and approximate operators (e.g., similar-to, near-
to, approximate)Allow trigger conditions to be specified with fuzzy and conceptual termsMimic human cognitive expression
CoSent monitors temporal composition events and executes rules with conceptual and approximate terms.
50
Key Features of CoSent
User defined rules transformed into low-level range values via knowledge base--Type Abstraction Hierarchies (TAHs)TAHs are typically generated from data sources automaticallyLeveraged on conventional DBMS (e.g., Oracle, Sybase, Teradata) triggering systemsRule definition is either specified by domain expert or derived by data mining technologies
51
Example of Rule Definitions with Data Mining Technology
Find attributes that frequently appear together for a given target attribute. If bad road condition and also bad weather,
then cause traffic congestion. If a person wrote many bad checks and also
has past eviction, then this person is a poor credit risk.
Based on the frequency of occurrence, the derived rules can be ranked according to certain information measure.
52
Conventional vs. Natural Language-Like Rules
Natural Language-Like RulIf the weather turns bad,
then notify all affected units in that region and all those that are near to that region.
Conventional RuleIf wind_speed > MAX_WIND_SPEED and
wave_height > MAX_WAVE_HEIGHTthen notify affected units in regions.
53
Natural Language-Like Rule Specifications
Example 2If the aircraft has a fuel contamination problem and the aircraft type is similar-to‘C-5’ based on the fuel type and fueling method, then notify the authority
Example 1If the number of departures of large cargo carrier (e.g., C-5, C-141) becomes significantly low in the past seven days, notify the Air Mobility Command.
54
Example
Wind Speed(meters/second)
14.913.512.212
11.810.610.510108.37.98.17.77.1
Wave Height(meter)
3.33.13.12.62.82.32.72.52.52.32.222
1.8
Wind Speed(meter/second)
7.47.77
6.56.66.56.66.45.95.76
4.54
3.7
Wave Height(meter)
1.91.71.61.51.61.41.41.51.51.41.61.41.31.2
Wind Speed is the hourly average over an eight-minute period for buoys and a two-minute period for land stations
Wave height is sampled in a 20-minute period
DoD Transportation PlanningWeather Report Table
55
TAH Example
Wave Height[0.6, 7.2]
VERY LOW[0.6, 1.25]
LOW[1.25, 1.75]
HIGH[1.75, 2.45]
VERYHIGH
[2.45, 7.2]
Wave Height
56
A Portionof WaveHeightTAH
57
Triggering Based on Temporal Composite Events
Notify the commander if within the past seven days, the total departure of C-5 is significantly low and the filter problem on C-5 is extremely high.
C-5 Departure
Low9-134.5
High134.5-208
Very Low53-134.5
Signt. Low9-53
Signt High162-208
Very High134.5-162
C-5 Filter Problem
Low0-53
High53-79
Very Low36-53
Extra. Low0-36
Ex High60-79
Very High53-60
58
Natural Language-Like Rule Translations
RuleDefinition
TAH
Conventional triggering system (e.g.,Oracle,Sybase,Teradata)
Low-level rules
Natural Language-Like Rules
Rule Parser
Rule Rep
Rule Decomposer
Rule Translator
Rule Translation/Relaxation
59
CoSent Architecture
TriggerAction(output)
Rule Parser
RelaxationEngine
TAHs
Rule Base
RuleManager
EventManager
ActionManager
Natural Language-Like Rule
Composite Event Specification and Notification
CoSent Server
(input)
(input/output)
Rule Translation/Relaxation
Commercial relational database systems (e.g., Oracle, Sybase, Teradata, etc.)
60
CoSent Demo
Natural Language-like rule with conceptual terms :“very high wave height” and ”very strong wind speed”Natural language-like rule with approximate term “nearby” and conceptual term “bad weather”Install trigger by drag-and-drop on the desired location on the map
61
Natural Language-Like Rule
Natural language-like rule containing conceptual terms, such as wave_height = “very-high” and wind_speed = “very-strong”, can be translated to range values by domain knowledge. For instance, type abstraction hierarchy. Natural language-like rules reduce the number of rules, thus easing rule maintenance
62
63
64
65
66
67
Rules With Approximate Terms
Rules can contain approximate terms, such as near-by and approximate, thus ease in rule specificationThe Trigger can be installed on the desired location on a map by drag-and-drop methodThe near-by region affected by the bad weather condition is specified by the trigger condition shown by a red circle
68
69
70
71
72
73
74
75
Map Server Architecture
76
Current Capabilities of Map Server
Visualization of Query Answers Icons Paths
Enter Query Constraints GraphicallyVisualization of Query Relaxation Process
77
Visualization of Relaxation Process
Query: Find seaports in the given region.
given region
relaxed region
78
Explanation Agent
Based on process traces and invocation rules, generate English-like explanation of: Relaxation process Quality of approximate matching Further explanation on definitions and terms in
explanation
79
Explanation of Relaxation Process
80
Relaxation Primitive: within
81
Extend near-to Primitive Points to Regions
82
Dynamic Nearness
Uses transaction history to identify nearness between tuples and values
If two tuples (or attribute values) appear together in a query answer, then that is a piece of evidence that they should be clustered together.
Gather evidence over time
Evolve the hierarchy
83
The BOOKS Relation
84
Schematic of a Browsing System
85
Schematic of a Query Modification System
86
The Links Between Tuples in BOOKS
87
Dynamic Links After Two Queries
88
Links with Counts
89
Number of Links with Threshold Value
90
Number of Links is determined by Maximum Answer Set Size
91
Query Formation From High-LevelConcepts for Relational
Databases
Guogen ZhangWesley ChuFrank MengGladys Kong
92
Outlines
OverviewSemantic Graph ModelHigh-Level Query Formation for SPJ queriesIncremental Query Formation for Complex QueriesConclusions
93
Overview: Query Formation
Based on semantic graph model, including user-defined relationshipsUser specifies requests and constraintsFormulate simple query by graph search technique Candidates ranked by information measure English-like query description
A complex query can be formulated by a series of simple queries
94
Related WorkQuery formulation as Steiner tree problem (Wald and Sorenson, 1984) limited to partial 2-tree graphs
Formulate simple Select-Project-Join (SPJ) queries via Universal Relation Model: no need to specify natural joins (Ullman 1988, Vardi, 1988)Object-oriented query path expression completion: partial order relationship between different path for ranking (Ioannidis and Lashkari, 1994)Query-by-Icon (QBI) [Massari and Chrysanthis, 1995]Natural language interfaces (text/voice): logical form to query
95
Semantic Graph Model
Weighted graph G=(V,E):Nodes: entities -- strong, weak, user-definedLinks: relationships -- ISA, HAS, simple, complex, user-defined For relational databases:
nodes: relations links: natural and user-defined joins Weight: information measure of a node or link
96
Query Feature
Query expression in a semantic graph
Query Topic, T: A set of Joins represented by links
Query Constraints, C: Query Conditions Query Aspect, A: Attribute list
97
A query topic for “aircraft can land on airports at geographical locations of countries”
airports
runwayscan land
have
is a located
airfield_chars
geoloc country
98
Semi-Automatic Generation of Semantic Model
Find natural joins through key and foreign key between nodes.User-defined links can be added into the graph model.Designers need to specify link types and assign names to all the elements in the graph.
99
Example of Semantic Model Generation
AIRPORT: APORT_NM, GEOLOC_TYPE, GLC_CD, ELEV_FT, …;key: APORT_NM.
RUNWAY: APORT_NM, RUNWAY_NM, GLC_CD, RUNWAY_LENGTH_FT,RUNWAY_WIDTH_FT, …; key: RUNWAY_NM.
GEOLOC: GLC_CD, GLC_NM, CY_CD, LATITUDE, LONGITUDE, …;key: GLC_CD.
COUNTRY: CY_CD, CY_NM, …; key: CY_CD.Links:
AIRPORT--RUNWAY: APORT_NM;AIRPORT--GEOLOC: GLC_CD;RUNWAY--GEOLOC: GLC_CD;GEOLOC--COUNTRY: CY_CD;
100
Information Measure
Information measure of a node or link, aI(a) = - log P(a)
where P(a) is the probability of a being used
in queries.Assume nodes and links are independent, for a subgraph with a set of elements A={ai | i = 1, …, n}, information measure is additive:
n
I(A) = SUM I(ai) i = 1
101
Information Measure (cont.)
Initial Information Measure:all the nodes = 1different nodes have a different value
Information measure is normalized and converted into counts
Probability of a node or a link is P(ai) = ci/cUpdate Information measureRanking based on Information measure, thus adapt to user feedback
102
Query Formulation
To formulate (simple) queries without knowledge of query language or database schema
Example:Find airports in Tunisia that can land a C-5 cargo plane
User input:Query aspect: AIRPORTS.APORT_NMConstraints: AIRCRAFT_AIRFIELD_CHARS.AC_TYPE_NAME = ‘C-5’
COUNTRY_STATE.CY_NM = ‘Tunisia’Links: CAN LAND
103
Formulated Query
SELECT R3.APORT_NMFROM AIRCRAFT_AIRFIELD_CHARS R0
AIRPORTS R3, COUNTRY_STATE R11GEOLOC R12, RUNWAYS R16
WHERE R0.AC_TYPE_NM = ‘C-5’AND R11.CY_NM = ‘Tunisia’AND R0.WT_MIN_AVG_LAND_DIST_FT <= R16.RUNWAY_LENGTH-FTAND R0.WT_MIN_RUNWAY_WIDTH_FT <= R16.RUNWAY_WIDTH_FTAND R11.GLC_CD = R3. GLC_CDAND R3.APORT_NM = R16.APORT_NMAND R11.CY_CD = R11.CY_CD
104
Query Completion as Graph Search Problem
Given: An incomplete input query topic Ti
Find a set of links to complete the topic (to make Ti connected)
Minimum Missing Information principle:The query completion candidate Tc (the missing links and nodes) for an incomplete input topic Ti contains the minimum information
105
Query Formulation Algorithm
Input: subgraph T of the semantic graph G Find candidates with the minimum Information
measure
Two methods used to limit the search scope: L-step-bound paths: paths that connect two
components with at most L links, to limit search within the neighborhood of the input subgraph
k-minimum completion candidates: only at most k candidates with minimum Information measure are kept (alpha-beta pruning)
106
Initial Components and 2-Step-BoundPaths For the “CAN LAND” Query
airportsrepair
(1)2
aircrafts airportshave authorize
1 2(2)runways
can land
airports
country
geolocat is a
1 1
geolocat located
1 1
geolocis a located
1 1
airportshave
1(3)
(4)
(5)
(6)
(a) Initial components (b) 2-step-bound paths
airfield_chars
airports
runways
runways
runways
airfield_chars
airfield_chars
country
country
airports
107
The Semantic Graph For theTransportation Domain
airports
runwayscan land
Relation Node
at
have
is a located
2
1
1 1
1
weather
airfield_chars
geoloc country
108
Incremental Query Formulation To assist user reach a complex query goal
with a series of simple queries The subsequent queries may depend on
results of preceding queries (derived relations)
Issues Incorporate derived relations into the
semantic graph Suggest missing attributes to link isolated
derived nodes to the graph
Incremental Query Formulation
109
Incremental Query Examples
Find airports in Tunisia.Which of these airports can land a C-5?What is the weather at these airports?
110
Incorporating Derived Relations
Source relation: contributes attributes to the derived relationsDerived relation: inherits properties of attributes from their source relationsDeriving link: links to the source relations through inherited keysInherited link: inherits links from the source relations
111
Extended semantic graph showing derived nodes, derived links and inherited links
airports
runwayscan land
Relation Node
at
have
is a located
2
1
1 1
1
Derived Node
Derived Link
Inherited Link
airfield_chars
weather
geoloc country
airporttunisiacanland airporttunisiacanlandweather
airporttunisia
112
Suggesting Key Attributes for a Query
Find source relations for the isolated derived relation.Suggest key of the source relations as attributes to include.
113
Concept and Attribute Specification Interface
114
Query Constraint Specification
115
Action Specification
116
English-Like Query Descriptionand the Formulated Query
117
Conclusions
Semantic graph model provides a basis for query formulation searchRanking of query candidates by information measure in formulation provides adaptive behaviorIncremental query formulation is effective for complex queriesGUI and voice interface can be built for query formulation from high-level concepts
118