Linked Data-‐‑based Concept Recommendation: Comparison of Different Methods in Open
Innovation Scenario Danica Damljanovic, Milan Stankovic,
Philippe Laublet
Innovation
Innovation Platforms
Challenge: Promote innovation problems to an audience of solvers who can propose relevant innovative solutions
Finding Meaningful Connec0ons
Clay mining …
Kaolinite extrac0on from
rocks …
Different communi-es use different terms and concepts to speak about seman-cally related things. Such “language” defines communi-es and separates them. Being able to find
meaningful connec-ons between concepts would enable us to build bridges between people and content.
h;p://bit.ly/hyProximity
Concept recommenda0on • Concepts you might not know but might want to use: to annotate
your content, to search for content, to search for people… • Help problem promoters discover relevant concepts (problem
promoters some0mes not field experts) • Discovery = relevance + unexpectedness
h;p://bit.ly/hyProximity
• HyProximity, a structure-based similarity • Structure-based Statistical Semantics Similarity
Random Indexing, a well-known statistical semantics from Information Retrieval to RDF
Discovering Direct and Lateral Concepts
Linked Data-‐based Concept Recommenda0on
Zemanta Textual Input
DBPedia Concepts found in the text
DBPedia Exploration suggestions
h;p://bit.ly/hyProximity
hyProximity
• We start from several seed concepts found directly in the text, and search the DBPedia graph
• The concepts found in the proximity of several seed concepts are considered more “in context” for the given input
• Concepts found at a shorter distance from the seed concepts have higher hyProximity
• Hierarchical: exploring skos:broader rela9ons • Transversal: exploring transversal links • mixed: a linear combina0on of hierarchical and transversal
Different Distance Func0ons skos:broader
other property
2 2 2 2+1
research.hypios.com/hyproximity
Paris Seine
Rivers in France Cities in France
Things in France
Products of France
Marne Chanel
Car Industry
BMW Peugeot
Different Distance Func0ons
“fashion” 1 1
research.hypios.com/hyproximity
1
Paris Seine
Rivers in France Cities in France
Things in France
Products of France
Marne
Car Industry
BMW Peugeot Chanel
flows through competitor
skos:broader
other property
famous for
• Hierarchical: exploring skos:broader rela0ons • Transversal: exploring transversal links • Mixed: a linear combina0on of hierarchical and transversal
Random Indexing • Words which appear in the similar context - with the
same set of other words - are contextually related e.g. synonyms.
• Synonyms tend not to co-occur with one another directly, so indirect inference is required to draw associations between words used to express the same idea
Two steps to Random Indexing
• Indexing o For an RDF graph, generate virtual documents o Prepare the corpus (pre-processing) o Generate semantic index
• Search - given a term X calculate a cosine similarity between the vector of that term and other vectors in the semantic space
Building context vectors
d1 0 0 -‐‑1 1 -‐‑1 1
d2 -‐‑1 1 0 0 1 -‐‑1
… dp 0 1 0 -‐‑1 -‐‑1 1
d1 d2 .. dp t1 1 2 .. 0
t2 3 0 .. 0
.. .. .. .. ..
tq 0 1 10
t1 t2 … tq
X =
Dimensionality = n
Seed length
M
D
T
Indexing: virtual documents
14
S
O2
O1
L7
P7
L3
L2
L1
P4
L4
P1
P2
P3
L8
L6
L5
P10 P9 P8
lexicalise
Representative subgraph for URI=S Virtual document for URI=S
P5 P6
P1 S P2 L2 S L1
S P3 L3
S
L5
P4 P5 L4 O1 S P4 O1 P6 S L6 S
L8
P7
P7 P9 O2
L7 P8
O2 S P7 O2 P10
S P7 O2 S P4 O1
Experiments • 26 real innovation problems from Hypios • Measure of success: the suggested concepts
appear in the actual solutions (precision, recall, f-measure)
(+) reasonable list of concepts from real scenarios (-) not complete:
o User study: measure discovery = relevance+unexpectedness
DBpedia Dataset • Select a number of properties relevant to the Open
Innovation-related scenario • dbo:product, dbp:pruducts, dbo:industry,
dbo:service, dbo:genre, and properties serving to establish a hierarchical categorization of con- cepts, namely dc:subject and skos:broader
Evaluation • “Gold standard”
o Extract problem URIs o Extract solution URIs
• Baseline: o Google Adwords Keyword Tool: finds similar
topics based on their distribution in textual corpora and the corpora of search queries.
o Suggesting up to 600 concepts which are then used for Web crawling for finding experts.
Evaluation: Results
! !
!!
User Study • Suggestions being both relevant and unexpected
o the most valuable discoveries for the user • 12 users • 34 problem evaluations
o 3060 suggested concepts/keywords.
• For the chosen innovation problem, the evaluators were presented with the lists of 30 top-ranked suggestions generated by adWords, hyProximity (mixed approach) and Random Indexing.
Example
User Study: Results
Conclusion • Linked Data valuable source of knowledge for
concept recommendation • Our two methods complementary
o hyProximity better for precision o Random Indexing better for recall
• User study: unexpectedness higher with our methods than with baseline
• Subjective user comment: o Random Indexing: generic o hyProximity: granular o adWords: redundant
Thank You! • Find out more: • http://research.hypios.com/?page_id=165
Contact us: • Danica Damljanovic @dancheeee • Milan Stankovic: @milstan