Mapping Tweets to Conference Talks: A Goldmine for Semantics
Milan Stankovic, Hypios, Paris-Sorbonne, FR & Matthew Rowe, KMI, Open University, UK
On Conference We Tweet
Is there a Correspondance?
?
Why?
tweettweet talktalkis about
Why?
tweettweet talktalkis about
Topic 3
Topic 2
Topic 1
has topic
has topic
has topic
useruser
made
Why?
tweettweet talktalkis about
Topic 3
Topic 2
Topic 1
has topic
has topic
has topic
useruser
made
interest ?interest ?
Why?
tweettweet talktalkis about
useruser
made
were at the same talk ?were at the same talk ?
tweettweetis about
useruser
made
Potential Benefits
• Digital memory• Conference feedback
– number of tweets for a talk– conversational aspects– sentiment analysis
• User profiling and expert finding• Trending topics
Rich Activity Twitter Event Data
• We take Twitter archives from TwapperKeeper
• We enrich Tweets with relevant DBPedia concepts using Zemanta
• We rely on existing Linked Data about talks to perform the mappings.
ESWC Dataset
• Collected during the Extended Semantic Web Conference 2010– Any tweets tagged with “eswc”
• 1082 tweets• 213 tweets enriched with concepts
Aligning Tweets with Talks
• Goal: Label tweets with talks• Method:
– Induce a labelling function to perform alignment
– Labelled data = events from Web of Data– Unlabelled data = tweets
Liii yx 1,
Uiix 1
YXf :
Aligning Tweets with Talks
1. Feature Extraction:
@prefix swrc: <http://swrc.ontoware.org/ontology#>@prefix swc: <http://data.semanticweb.org/ns/swc/ontology#>@prefix dog: <http://data.semanticweb.org>@prefix dc: <http://purl.org/dc/elements/1.1/><http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ; dc:subject "Knowledge Acquisition" ; dc:subject "Semantic Analysis" ; dc:subject "Social Web" ; dc:subject "Microblogs" ; dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ; swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ; swrc:author <http://data.semanticweb.org/person/claudia-wagner> .<http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ; foaf:name "Claudia Wagner" ; swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ; foaf:based_near <http://dbpedia.org/resource/Austria>
Aligning Tweets with Talks
1. Feature Extraction: F1 - Immediate Resource Leaves
@prefix swrc: <http://swrc.ontoware.org/ontology#>@prefix swc: <http://data.semanticweb.org/ns/swc/ontology#>@prefix dog: <http://data.semanticweb.org>@prefix dc: <http://purl.org/dc/elements/1.1/><http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ; dc:subject "Knowledge Acquisition" ; dc:subject "Semantic Analysis" ; dc:subject "Social Web" ; dc:subject "Microblogs" ; dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ; swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ; swrc:author <http://data.semanticweb.org/person/claudia-wagner> .<http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ; foaf:name "Claudia Wagner" ; swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ; foaf:based_near <http://dbpedia.org/resource/Austria>
Knowledge Acquisition Semantic Analysis Social Web Microblogs Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams Although one might argue that little wisdom can be conveyed in messages of 140 http://data.semanticweb.org/person/claudia-wagner
Knowledge Acquisition Semantic Analysis Social Web Microblogs Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams Although one might argue that little wisdom can be conveyed in messages of 140 http://data.semanticweb.org/person/claudia-wagner
Aligning Tweets with Talks
1. Feature Extraction: F2 – 1-step Resource Leaves
@prefix swrc: <http://swrc.ontoware.org/ontology#>@prefix swc: <http://data.semanticweb.org/ns/swc/ontology#>@prefix dog: <http://data.semanticweb.org>@prefix dc: <http://purl.org/dc/elements/1.1/><http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ; dc:subject "Knowledge Acquisition" ; dc:subject "Semantic Analysis" ; dc:subject "Social Web" ; dc:subject "Microblogs" ; dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ; swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ; swrc:author <http://data.semanticweb.org/person/claudia-wagner> .<http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ; foaf:name "Claudia Wagner" ; swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ; foaf:based_near <http://dbpedia.org/resource/Austria>
http://data.semanticweb.org/person/claudia-wagner Claudia Wagner http://data.semanticweb.org/organization/joanneum-research http://dbpedia.org/resource/Austria
http://data.semanticweb.org/person/claudia-wagner Claudia Wagner http://data.semanticweb.org/organization/joanneum-research http://dbpedia.org/resource/Austria
Aligning Tweets with Talks
1. Feature Extraction: F3 – DBPedia Concepts
@prefix swrc: <http://swrc.ontoware.org/ontology#>@prefix swc: <http://data.semanticweb.org/ns/swc/ontology#>@prefix dog: <http://data.semanticweb.org>@prefix dc: <http://purl.org/dc/elements/1.1/><http://data.semanticweb.org/conference/eswc/2010/paper/phd_symposium/23> rdf:type swrc:InProceedings ; dc:subject "Knowledge Acquisition" ; dc:subject "Semantic Analysis" ; dc:subject "Social Web" ; dc:subject "Microblogs" ; dc:title "Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams" ; swrc:abstract "Although one might argue that little wisdom can be conveyed in messages of 140 ..." ; swrc:author <http://data.semanticweb.org/person/claudia-wagner> .<http://data.semanticweb.org/person/claudia-wagner> rdf:type foaf:Person ; foaf:name "Claudia Wagner" ; swrc:affiliation <http://data.semanticweb.org/organization/joanneum-research> ; foaf:based_near <http://dbpedia.org/resource/Austria>
Http://dbpedia.org/resource/TwitterHttp://dbpedia.org/resource/Twitter
Http://dbpedia.org/resource/Social_WebHttp://dbpedia.org/resource/Social_Web
Http://dbpedia.org/resource/MicroblogsHttp://dbpedia.org/resource/Microblogs
Aligning Tweets with Talks
2. Feature Vector Composition
Knowledge Acquisition Semantic Analysis Social Web Microblogs Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams Although one might argue that little wisdom can be conveyed in messages of 140 http://data.semanticweb.org/person/claudia-wagner
Knowledge Acquisition Semantic Analysis Social Web Microblogs Exploring the Wisdom of the Tweets: Knowledge Acquisition from Social Awareness Streams Although one might argue that little wisdom can be conveyed in messages of 140 http://data.semanticweb.org/person/claudia-wagner
knowledge acquisition semanticanalysis social web microblogs exploring wisdom tweetsknowledgeacquisitionsocial awareness streams wisdom messages
IndexerIndexer
knowledge 2
acquisition 2
semantic 1
analysis 1
social 2
web 1
microblogs 1
exploring 1
wisdom 1
tweets 1
awareness 1
streams 1
wisdom 1
messages 1
Aligning Tweets with Talks
3. Inducing the Labelling Function– Both tweets and events are provided as feature
vectors– Induce a labelling function:
Choose the most likely event (y) given the tweet (x)
YXf :
Aligning Tweets with Talks
3. Inducing the Labelling Function: Proximity-based Clustering
– Build a centroid vector for each event• From event feature vectors
– Compare each tweet vector with each centroid• Choose event (y) which is closest
)),((minarg y
Yy
xdy
n
iiixxmanhat
1
),( 2
1
),(
n
iiixxeucl
Aligning Tweets with Talks
3. Inducing the Labelling Function: Naive Bayes Classification
– Assigns most probably event label given tweet features
– Using Bayes Theorem, we write this as:
),,,|( 21maxarg n
Yy
xxxyPy
ii
Yy
n
Yy
n
n
Yy
yxPyPy
yPyxxxPy
xxxP
yPyxxxPy
)|()(
)()|,,,(
),,,(
)()|,,,(
maxarg
maxarg
maxarg
21
21
21
Experiments
• Dataset– Corpus of Tweets collected during ESWC 2010
• Gold Standard Construction– Used 3 raters to label a portion of tweet corpus
• 200 tweets labelled
– Took interrater agreement between raters• Using Kappa statistic
– Initial Agreement was too low: 0.328– Utilised Delphi method to improve agreement– Second round of labelling produced: 0.820
Experiments
• Evaluation Measures– Precision: proportion of event tweets correctly
labelled– Recall: proportion of tweets successfully
returned for a tweet– F-measure: Harmonic mean of precision and
recall
• Placed emphasis of precision over recallRP
RPmeasuref
2
2 )1(
1,5.0,2.0
Results
Imagine…
Imagine user profiling
ESWC dataset, user Matthew Rowe
Imagine conference feedback
ESWC dataset
directly from Tweets
from mappings (Talks)
We Challenge You
We Challenge You!
• Beat us in mappings!• We provide the human generated gold
stadnard mappings• Can you find a more precise way to do tweet-
talk mappings?• Can you find other uses? Let us know!
We Challenge You!
• you can find the gold standard data here :
http://research.hypios.com/?page_id=131
• you can find all the data (and automated mappings) here:
http://data.hypios.com/tweets/sparql
We Challenge You!
http://data.hypios.com/tweets/sparql
SELECT ?tweet ?talk WHERE {
?tweet <http://linkedevents.org/ontology/illustrate> ?talk.
}