1
Dr. Virendra C. BhavsarProfessor and Director, Advanced Computational Res. Lab.
Faculty of Computer Science
University of New Brunswick (UNB)
Fredericton, Canada
BCS Student: Marcel Ball
MCS Students: Anurag Singh, Jin Jing, Sebastien Mathieu,, Jie Li
PhD Student: Lu Yang
Post-Doctoral Fellows: Dr. Biplab Sarker and Dr. Manish Joshi
Collaborators: Dr. Riyanarto Sarno and Dr. Harold Boley
June 14, 2010
Semantic Matching
2
Virendra C. Bhavsar
• UNB: since 1983; > 35 years of software research
development experience
• Interests: real-time embedded systems, computer
graphics, software engineering, natural language
processing, databases, bioinformatics, parallel computing,
artificial intelligence, …
• Bioinformatics - Canadian Potato Genomics Project
• Atlantic Computational Excellence Network (ACEnet):
~30 million Atlantic Canada project in high performance
computing
• Semantic Matching
3
Outline
• Syntactic Matching
• Semantic Matching
• Semantic Matching: Taxonomy, Ontology and
Partonomy
• UNB Semantic Matching Engines – Applications
• Conclusion
4
Exact String Matching
• Binary result 0.0 or 1.0
Permutation of strings
“Java Programming” versus “Programming in Java”
Number of identical words
Maximum length of the two strings
Example 1
For two node labels “a b c” and “a b d e”, their similarity is:
2
4= 0.5
Syntactic Matching
5
Example 2
Node labels “electric chair” and “committee chair”
1
2= 0.5 meaningful?
• Syntactic Matching does not consider additional
domain knowledge
•Semantic matching techniques are needed for the
above problems
Syntactic Matching
6
Semantic Matching Applications
• Semantic searching, e.g. Google
• e-Business
• e-Learning
• Matchmaking portals
• Information Retrieval
• Web Services
• Information Integration
• Semantic Web
7
Semantic Matching
• Examples
{Car : Truck} {Toyota Corrolla : Toyota Camry}
{Car : Automobile} {Car : Apple}
• Semantic Similarity versus Semantic Distance
Matching of: words, short texts, documents,
schemas/structures, pictures, videos
• Taxonomy
• Partonomy
• Ontology
8
Taxonomy
• Practice and science of classification
9
Ontology
• Domain Ontology: Explicit formal
specifications of the terms in a domain and relations among them
Upper Ontology: Across domains
10
Concept Similarity in a Taxonomy
Given a taxonomy and two
concepts (e.g., A and B),
find the semantic similarity
of the two concepts
A B
Taxonomy
11
{Produce, Green goods} 3.034
{Fruit} 3.374
{Apple} 3.945{Berry} 4.907
{Banana} 5.267
{Boxberry} 7.576 {Cranberry} 6.285
Concept Similarity in a Taxonomy
12
• More and more on-line transactions (e.g. e-Bay, Kijiji, etc.)
• Buyers and sellers input key words and/or specify values
for some product features
• A list of recommended sellers (with product advertisements)
and/or buyers (with product requests) is presented
• Flat representation of products cannot represent the
hierarchical „part-of‟ relationship of product parts
• Match-making is not precise
• Negotiation space is large
Motivation
13
Main Server
User Info
User Profiles
User Agents
…
…
Agents
…
…
Matcher1 Matchern
To other sites
(network)
Web
BrowserUser
e-Market
• e-business, e-learning …
• Buyer-Seller matching
• Metadata for buyers and sellers
• Keywords/keyphrases
e-Business Applications
14
Programming Techniques
Applicative
Programming
0.6
0.5General
Automatic
Programming
Concurrent
ProgrammingSequential
Programming
Object-Oriented
Programming
Distributed
Programming
Parallel
Programming
0.8 0.50.9
0.7
0.7 0.5
• The taxonomy tree of “Programming Techniques” according
to the ACM Computing Classification System
•Arc Weights
Semantic Matching ─ A Taxonomy Tree
15
Partonomy
• Tree representation for product/service descriptions
• Weights
2002
Car
FordBlack
Make
Color Year
0.3
0.2
0.5
16
Similarity of Buyer and Sellers
buyer seller1
2002
Car
FordBlack
Make
Color Year
0.1
0.1
0.8
2002
Car
FordRed
Make
Color Year
0.05
0.05
0.9
0.925
2002
Car
FordRed
Make
Color Year
0.2
0.2
0.6
seller2
2002
Car
FordRed
Make
Color Year
0.1
0.6 0.3
seller3
0.85 0.65
17
Semantic Matching ─ Local Similarity
• Local similarity measures for leaf nodes
• “Price” type
• “Date” type
• . . .
18
PriceRangeSim ([Bpref, Bmax], [Smin, Spref])
Begin
If Spref <= Bpref similarity = 1.0
else if Bmax < Smin similarity = 0.0
else if Bmax = Smin
similarity =
else
{ MIN = min{MIN, Smin}
MAX = max{MAX, Bmax}
similarity =
}
return similarity
End.
• This algorithm can be easily adapted to the “price”-typed attributes
e.g. “salary range” in job seeking and recruiting e-Market
• Pseudo code of the price-range similarity algorithm
MINMAX
005.0
MINMAX
minmax SB
Semantic Matching ─ Price Matching Algorithm
19
UNB Similarity Engines -
Implementation
• Java Implementation
• Testing on systematically varied cases
20
• eduSource e-Learning project
•Learning Object Metadata Generator: LOMGen
Partonomy Tree Similarity Engine ─
eLearning Application
SimilarityEngine(Java)
Translator(XSLT)
CANLOM(XML)
Prefilter(SQL)
LOMGen(Java)
LOR(HTML)
Enduser
Administrator
user input
prefilter parameters (Query URI)
WOO RuleML file
Recommended results
HTML files
partial CanCore filesCanCore
files
prefiltered CanCore files
WOO RuleML files
DATABASE(Access)
UI (Java)
Keyword Table
Administrator input
(1)
(2)
(4) (5)
(6) (7)
(3)
(8)
(a)
(b)
(c)
Search
Results
21
(si (wi + w'i)/2) (A(si)(wi + w'i)/2)A(si) ≥ si
lom
educational
0.5
general
format platform0.50.50.5
Introduction
to Oracle
t t´
technical0.3334 0.33330.3333
edu-set gen-set tec-set
language
en
title
HTML WinXP
lom
0.1
general
format platform0.90.80.2
Basic
Oracle
technical0.70.3
gen-set tec-set
language
en
title
* WinXP
* : Don’t Care
• Partonomy similarity [Bhavsar et al. 2004]
Fragments of learning object trees [Boley et al. 2005] for learning object
matching (http://www.cs.unb.ca/agentmatcher)
Partonomy Tree Similarity Algorithm
─ Similarity Algorithm
22
• Teclantic protal http://www.teclantic.ca
•ca)
Partonomy Tree Similarity Engine
─ Matchmaking Application
23
Current Work
• Weighted Tree Semantic Tree Similarity Engines
•Semantic searching
• Weighted Graph Similarity Engines
• Multi-core and cluster implementations
• Matchmaking portals
24
Conclusion
• UNB Weighted Tree Similarity Engines
• Semantic Global and Local Matching
• Applications: e-Learning, e-Business, Matchmaking portals, …
• Looking for licensing and adapting the UNB technology to commercial partners
25
Publications
5 Journal papers
10 Conference papers
1 Book Chapter
4 MCS Theses
1 PhD Thesis
26
Looking for a Post-doctoral Fellow
to start working right now!
Thank you !