Post on 19-Jun-2015
transcript
Finding Co-solvers on Twitter, with a Little Help from Linked Data
Milan Stankovic, Hypios, Université Paris-Sorbonne, France*Matthew Rowe, KMi, Open University, UKPhilippe Laublet, Université Paris-Sorbonne, France
Outline
• Context• Problem• Our Approach• Evaluation• Example of use• Conclusion and questions
Context: Innovation on the Web
Innovation SeekersSolvers from industry, research etc.
Academia
Problem: Find Collaborators
Innovation Seeker
Problem
???
solver
Problem: Find Collaborators
Innovation Seeker
Problem
???
solver
•How to find collaborators that complement the solver’s competence with regards to the problem
•How to find collaborators that are compatible with him in terms of teamwork
?
Problem: Find Collaborators
Problem
solver
Complementary Competence
Interest Similarity
Social Similarity
inspired by social studies on team composition, and factors that influence good teamwork
Our Approach
profiling >> profile extension >> calculation of similarities >> ranking
Implementation and tests performed using data from Twitter
Our Approach: Profiling
Our Approach: Profiling
solver
candidate collaborators
problem
conceptial
social
Our Approach: Profiling• Conceptual Profiles– users: Zemanta used to extract DBPedia concepts from
textual elements that the user created on twitter (tweets, bio, etc.). Profiles contain concepts and the frequency of their occurrence
– problem: Text of the innovation problem treated with Zemanta to extract concepts
• Social Profiles– contain all the contacts of a given user on Twitter
• Both types of profiles are in vector form.• Simple in purpose, to get most topics, not to specialize
for topics of highest expertise
Our Approach: Profiling
Our Approach: Profile Extension
Our Approach: Profiling
• Why extend profiles:– imperfection of source data
(tweets)– incompleteness of coverage
(due to difference in vocabulary some concepts may stay unnoticed)
– to perform broader/lateral match
Our Approach: Profile Extension
Our Approach: Profiling
• How– hyProximity (HPSR): a graph
measure using Linked Data (tested on DBPedia)
– DMSR: distributional measure inspired by Normalized Google Distance
– PRF: Pseudo Relevance Feedback
Our Approach: Profile Extension
Our Approach: Profiling
• HSPR (hyProximity)
Our Approach: Profile Extension
€
HPSR(c1,c2) = ic(K i) + link(pp∈P
∑K i ∈K (c1 ,c2 )
∑ ,c1,c2) • pond(p,c1)
skos:broader
skos:broaderdct:subject
Our Approach: Profiling
• DMSR – Distributional Measure of Semantic Relatedness
Our Approach: Profile Extension
c1 c2 c16 c18 c32
c1 c2 c15 c43 c56
c1 c3 c4 c10 c13
c1 and c2 more related then c1 and c3
Our Approach: Profiling
• PRF: Pseudo Relevance Feedback– Distributional measure based on the profiles
appearing in the n best ranked solutions.– The same measure of co-occurrence as DMSR,
applied to the set of first 10 suggestions– This method can be applied with any ranking
technique
Our Approach: Profile Extension
Our Approach: ProfilingOur Approach: Similarities
Our Approach: ProfilingOur Approach: Similarities
Complementarity (Similarity with difference topics)
Conceptual Similarity (Similarity of conceptual profiles)
Social Similarity (Similarity of Social Profiles)
Our Approach: Profiling
• Vector Similarity Measures– Weighted Overlap
– Cosine Similarity
Our Approach: Similarities
wi
cosine
Our Approach: ProfilingRanking
• By one similarity measure– complementarity– conceptual similarity– social similarity
• By a linear combination of measuresa*Comp+b*ConcSim+c*SocSim
• By a product of measuresComp*ConcSim*SocSim
Our Approach: ProfilingEvaluation
• Evaluation 1– recommending a collaborator to a group of solvers– a group of 3 solvers (experts in Semantic Web) is
trying to solve 3 cross-disciplinary problems– problems inspired from real challenges (workshops,
calls for papers, etc.)• Evaluation 2– recommending collaborators to individual solvers– 12 twitter users, experts in Semantic Web look for
collaborators for the same 3 problems
Our Approach: ProfilingEvaluation: Metrics
• Discounted Cumulative Gain– what is the value of considering first 10
suggestions, and what is the quality of their ordering
• Average Precision– what is the cumulative benefit of considering each
next suggestion in a particular ranking
Our Approach: ProfilingEvaluation 1
• Discounted Cumulative Gaincompatibility
Our Approach: ProfilingEvaluation 1
• Discounted Cumulative Gainconceptual similarity
Our Approach: ProfilingEvaluation 2
• Composite Ranking Functions: Product– Comp*ConcSim*SocSim– PRF(Comp*ConcSim*SocSim): PRF problem profile expansion with
composite similarity. – HSPR(Comp)*ConcSim*SocSim: HPSR expansion performed on difference
topics prior to calculating the complementarity (similarity with difference topics)
– Comp*DMSR(ConcSim)*SocSim: DMSR expansion performed over the seed user profile prior to calculating interest similarity.
– HSPR(Comp)*DMSR(ConcSim)*SocSim: composite function in which HPSR is used to expand profile topics and DMSR to expand seed user topic profile prior to calculating the similarities.
Our Approach: ProfilingEvaluation 2
• Discounted Cumulative Gain
Comp*ConcSim*SocSim
PRF(Comp*ConcSim*SocSim)
HSPR(Comp)*ConcSim*SocSim
Comp*DMSR(ConcSim)*SocSim
HSPR(Comp)*DMSR(ConcSim)*SocSim
Our Approach: ProfilingEvaluation 2
• Average Precision (Cumulative)
Comp*ConcSim*SocSim
PRF(Comp*ConcSim*SocSim)
HSPR(Comp)*ConcSim*SocSim
Comp*DMSR(ConcSim)*SocSim
HSPR(Comp)*DMSR(ConcSim)*SocSim
Our Approach: ProfilingConclusions
• The Linked Data based concept expansion technique (hyProximity) gives best results when expanding topics for Compatibility measures. A distributional one works slightly better for Conceptual Similarity measures.
• In a composite ranking function, expanding profiles with hyProximity is beneficial if applied only to Compatibility. Expansion in both Compatibility and Conceptual Similarity has negative effects.
• All profile expansion techniques, applied individually, have positive effects in comparisons to direct similarity calculation with no expansion.
Our Approach: ProfilingTake Away
Problem
Compatibility
Conceptual Similarity
( ), hyProximitya Linked Data-based measure
hyProximitya Linked Data-based measure
DMSRa distributional
measure
DMSRa distributional
measure
Expansion
Our Approach: ProfilingExample
Problem : Semantic Web representation of start-up history for start-up performance indicators
davidsrosefundingpostECVentureCapitaBVCAvc20AndySackCVCACanadaAustin_Startupstgmtgmdavidblerner
User: Milan Stankovic (@milstan)
Suggestions:Angel investor specialized
in technology statups
Entrepreneur, Social Networks (KLOUT), Metrics
Investors and Entrepreneurs, Information
technology
Investors and Entrepreneurs, Information
technology
?