Date post: | 16-Apr-2017 |
Category: |
Technology |
Upload: | gjhouben |
View: | 581 times |
Download: | 0 times |
Delft University of Technology
Link, Like, Follow, Friend: The Social Element in User Modeling and Adaptation UMAP, Rome, June, 2013
Geert-Jan Houben Web Information Systems, TU Delft
3
Social Web & UMAP
We observe, reflect, speculate, and raise discussion about evolutions and opportunities for UMAP to make a difference.
Triggered by the social element in UMAP and other conferences
& our own experience in the field.
4
Social Web in UMAP: a number of mentions of ‘social’, and a small number of ‘social web’ in the papers.
New U (in UMAP), new users
And we see more. We see how the Social Web mirrors people, mirrors users.
What we learn at the Social Web, learn (more) about users and for user modeling
and adaptation.
5
UMAP in the new Web world
What we learn at the Social Web allows us to reconsider UMAP in the Web.
It brings new opportunities for us as researchers.
Perhaps it brings new needs.
Surely, these are opportunities that we can position within our
UMAP research agenda and UMAP application portfolio.
6
SWUMAP: 1 + 1 = 3
Experience shows to combine: Understanding & Creating UM & AP Machines & Humans Arrive at a body of knowledge for turning insights about users and usage into added value in society and economy.
7
UMAP systems are Web systems
Lessons tell us to reconsider our system concept. On the Web systems are ‘in vivo’: open and dynamic.
• Users & data are not (longer) ‘inside the system’. • Users & data change, move (more) quickly.
This impacts understanding and creating of systems. This also impacts the systems’ architecture. With the (Social) Web as our laboratory, this also impacts our research discipline.
8
APPLICATION
HUMANS FOR AUGMENTATION
USERS DOMAIN
DOMAIN Augmented with Web Semantics
USERS Augmented with Web Semantics
REAL DOMAIN
REAL USERS
11
Domain: Incidents and emergencies
In literature we see a fair attention for the domain of incidents and emergencies. Our own experience from several years is situated in that domain. It has given us a good feeling for what is needed and how UMAP research can be part of a bigger effort to solve real-world problems.
12
Domain: Incidents and emergencies
In literature, most attention is directed towards understanding and detecting. Sometimes we see further objectives in responding, creating situational awareness (specially in massive incidents), and prevention. Most used in these studies is Twitter.
15
With Twitter, we have a whole new reflection of what is happening in the world. A whole new source of digital data that reflects the (real) world. We need to understand that reflection to understand the world and help the world. Two challenges: 1. Understand the world, and 2. Understand its reflection in the Social Web.
16
CrowdSense BV
http://twitcident.org http://tno.nl/twitcident http://twitcident.com
Twitcident spin-off collaboration
Our real-world lab
17
400+ million tweets per day • Netherlands ranks #1 in Twitter penetration
Twitter users publish about “anything” • Work/private life • Interesting events • Etc.
Twitter tells us a lot about the world. And its users can be seen to act as social sensors and citizen journalists.
Monitoring Twitter
24
A new source of knowledge
An example of the speed and the nature of knowledge that Twitter provides and what it does to provide knowledge about what really happened. Also, it shows what we need to know and understand to use and interpret this effectively.
25
1. Early warning • Twitter users publish early signals that might indicate an increased
risk or potential incident.
2. Crisis management • (Eye-witness) Twitter users disseminate information about incidents
which can support operational emergency services.
3. Post evaluation • Post analyzing incident data (in retrospect) to measure the
effectiveness of emergency services.
Twitcident goals
26
• Emergency services • Law enforcement, fire fighters, governments
• Big event organizers • Festival security companies
• Utility organizations • Public transport, energy supply, other vital infrastructures
Stakeholders
30
Could we see this impact coming?
Semantics 25 minutes before incident
1. Weather: storm, cloud-burst, wind, …. 2. Locations: Brussel, Gent, Hasselt, …
3. Intensity: heavy, crazy, massive… 4. Impact: hail balls, falling trees…
Impact storm Why is there a peak?
“ ”
33
Example festival disaster
The research into this example created a lot of knowledge about what is possible and what is desired. It was also a good example to follow and approach new use cases to build more general understanding and theory.
36
Twitcident processes 100k tweets/day
The social weather map provides ProRail with a timely and accurate overview of citizen observations. In addition to other sources of knowledge.
Value
37
Big Events New Year’s Eve Serious Request Elections Lowlands Summer Carnaval Fantasy Island Queen’s Day
39
Social media monitoring was done with 1-3 security officers
Violence, riots, fires, fireworks, crowds, ..
40
Not only monitoring
The previous examples are not only about monitoring Twitter to know what is happening out there.
53
Recommendations by Cohen
• Clear communication strategy • Planning & organizing in advance • Social media monitoring • Clear intervention policy
56
Recommendation from experience
Let us go and find the needle that tells us what appears to be happening out there But let us also think about how to support the action to make the world out there a better one.
57
Meaningful and actionable
Twitcident has learned us how information obtained from Twitter needs to be meaningful and actionable.
58
“Polling meaningful information” “Sifting thousands of tweets during hurricane Irene” “Getting situational awareness” “Finding the eye’s on the ground”
“Finding actionable information” “Providing timely reaction” “
“Volunteers are great” “But we need hybrid approaches to monitor social media”
Patrick Meier
Today’s challenges
59
Hybrid approach
Twitcident has also shown us how these problems ask for a hybrid approach with humans in the loop that handle and interpret the knowledge derived from the Social Web. Big Data is available from the Social Web, but Small Interpretations are needed, to get it right!
60
Human interpretation inside
The nature of these problems makes that solutions are not fully automatic.
They involve users of systems
that help the interpretation and decision taking.
It is a special kind of users that we (as UMAP) can consider
and that is fast growing and in urgent need of support.
61
Take home from experience
Learn from concrete cases: • Case-based experimental approaches bring specific understanding and
experience necessary for general understanding and theory.
• Cases can have great value for stakeholders.
It is all about correct and actionable interpretation:
• Make information meaningful and actionable in the context.
• Employ hybrid, human-enhanced approaches for the context.
62
APPLICATION
HUMANS FOR AUGMENTATION
USERS DOMAIN
DOMAIN Augmented with Web Semantics
USERS Augmented with Web Semantics
REAL DOMAIN
REAL USERS
65
Challenge: Making sense of Twitter
Inspired by different applications and domains, researchers have given attention to underlying technology for making sense of Twitter. ‘Finding the needle’ as the research challenge.
66
Technology for making sense
The sense-making usually relies on application and domain specific knowledge and researchers investigate how to do it effectively. Semantics and interactivity prove to be important ingredients. In fact, it turns out that sense-making, i.e. finding the needle, is a combination of many things that need to be coming together.
69
Semantics for filtering and search
In [HT2012] we considered what is needed as first steps in processing tweets, before we can ‘analyze’ them.
70
1. (Automatic) Filtering: Given an incident, how can one automatically identify those tweets that are relevant to the incident?
2. Search & Analytics: How can one improve search and analytical capabilities so that users can explore information in the streams of tweets?
Twitter streams
Challenges
Filtering
topic
Search & Analytics
information need
71
Dataset
• Twitter corpus (TREC Microblog Track 2011) • 16 million tweets (Jan. 24th – Feb. 8th, 2011 ) • 4,766,901 tweets classified as English • 6.2 million entity-extractions
• News (Same time period) • 62 RSS News Feeds • 13,959 News Articles • 357,559 entity-extractions
72
Filtering evaluation
!"#$%!"&'% !"&&%
!"$'%
!"#(%
!"&)%!"$*%!"#)%
!"&#%
!"'&%
!")#%
!"+$%
!%
!"&%
!"#%
!"$%
!"+%
!")%
!"'%
!"(%
,-./012%3456-7408%%
,-./012%3456-7408%946:%
;-9<%=>06-?6@/54A/1>0%
B/<-540-C%D-E9>7F%3456-7408%
GHI%
IJ&!%
IJ$!%
K-2/55%
Semantic strategies outperform the keyword-based filtering regarding all metrics.
73
Filtering evaluation
The semantic strategy is more robust and achieves higher precisions for complex topics.
1 2 3 4number of entities extracted from inital topic description
0
0.2
0.4
0.6
0.8
1
Prec
isio
n@30
and
Rec
all
Precision@30Recall
1 2 3 4 5number of words in the inital topic description
0
0.2
0.4
0.6
0.8
1
Prec
isio
n@30
and
Rec
all
Precision@30Recall
74
Faceted search evaluation
!"#$%
!"&'%!"'#%
!%
!"(%
!")%
!"'%
!"*%
+%
,-./0.1234567.8%,62.9.8%7.6-2:%
:67:96;4567.8%,62.9.8%7.6-2:%
:67:96;4567.8%<.3=>-8%7.6-2:%
!"#$%&"
'()*+'#,%&#$
-%.!
&&/%+
0%1#*2"1%(1"3
%
with semantic enrichment without semantic enrichment
The semantic faceted search strategy improves the search performance by 34.8% and 22.4%.
75
Faceted search evaluation
Strategies with semantic enrichment outperform those without in predicting appropriate facet-values.
3 Adaptive Faceted Search on Twitter
!"#$% !"#&%!"#'%
!"'(%
!"#&% !")'%!"#(%
!"'*%!"#+% !"#)%!",+%
!"',%
!%!"!+%!"'%
!"'+%!",%
!",+%!"#%
!"#+%!")%
!")+%
-./0123456.7%
89.:0.2058;.%
</.=>.2?@%
A30AB3C:D30.7%
EF+%
EF'!%
GHH%
with semantic enrichment without semantic enrichment
76
Lessons
The context: a (Twitcident-inspired) framework for filtering, searching, and analyzing information about incidents that people publish on Twitter. We have seen how to obtain • better filtering of Twitter messages for a given incident, • better search for relevant information about an incident within the filtered messages.
For these first steps in processing Twitter messages, the semantic interpretation is the key element that we need to understand for the given context.
78
Semantics for enrichment and linkage
In [ESWC2011] we focused more on the semantics for enrichment and linkage to connect the tweets to background knowledge and thus enhance what we can learn from them.
79
SI Sportsman of the year: Surprise French Open champ Francesca Schiavone Thirty in women's tennis is primordially old, … news article
topic:Sports topic:Sports
topic:Tennis
person:Francesca_Schiavone
oc:SportsGame
event:FrenchOpen
francesca is becoming #sport idol of the year!
microblog post
user
enrichment enrichment
user modeling
linkage
Profile Topics of interest: - topic:Tennis - topic:Sports People of interest: - person:Francesca_Schiavone Events of interest: - event:FrenchOpen
Example: Semantic enrichment of Twitter posts
80
SI Sportsman of the year: Surprise French Open champ Francesca Schiavone Thirty in women's tennis is primordially old, … news article
francesca is becoming #sport idol of the year!
microblog post
user linkage
How?
Goal in this linkage discovery is to iden3fy news resources that are related to a given Twi8er message: 1. Web resource has to be related to the given tweet 2. Web resource has to be related to news
Linkage discovery
81
Francesca Schiavone is sportsman of the year #sport #tennis
Content-based
SI Sportsman of the year: Surprise French Open champ Francesca Schiavone Thirty in women's tennis is primordially old…
Francesca Schiavone is sportsman of the year #sport #tennis
Hashtag-based Petkovic & Goerges leading German tennis revival there are signs that German tennis is…
The image cannot be displayed.
Linkage discovery strategies
82
nice! http://bit.ly/eiU33c URL-based
SI Sportsman of the year: Surprise French Open champ Francesca Schiavone Thirty in women's tennis is primordially old…
news article URL
Entity-based
Olympic champion and world number nine Elena Dementieva announced her retirement The 29-year-old Russian delivered the shock news after losing to Francesca Schiavone in the group stages of the season-ending tournamen …
news article
Entity-based
Francesca Schiavone is sportsman of the year #sport #tennis temporal constraint
Old news L publish date
publish date
• URL-based (Strict): only consider content of the Twitter message
• URL-based (Lenient): also consider reply or re-tweet messages
Linkage discovery strategies
83
Evaluation on linkage discovery
!"#!#$%
!"&!'$%
!"&'()%
!")#$$%
!")*+%
!"*!(,%
!% !"#% !"'% !"$% !"&% !"(% !"+% !")% !"*% !",%
-./01/0234516%78492.:2;.<65=%
>45?049234516%
@/A0B234516%7CD0?.E0%01FG.<4H%I./50<4D/05=%
@/A0B234516%
JKL234516%7H1/D1/0=%
JKL234516%750<DI0=%
!"#$%&%'()
URL-based strategies offer good linkage.
84
Analysis on linkage discovery and semantic enrichment
• URL-based strategies: more than 10 tweet-news relations for c.a. more than 1000
• Entity-based strategy: found a far more higher number of tweet-news relations
• Hashtag-based strategy failed for more than 79% of the users because of the limited usage of hashtags
• Combination of all strategies: higher than 10 tweet-news relation found for more than 20% of the users
Entity-based URL-based
Hashtag-based
Combination
Combined strategies perform better.
85
Lessons
There is good background knowledge out there, if we are able to understand how it connects to the domain and context we are considering. Many applications can share the same enrichment and linking, but not all. With common descriptions of the problem, we can share enrichment and linking (more) effectively.
87
Challenge: Social web for profiles
An ambition often seen in conferences like this one is to exploit the semantic enriched social web knowledge for the purpose of creating or enhancing user profiles. These profiles can then be used for adaptation and personalization.
88
Components for profiling
For applications such as personalized news recommendation, like in our [UMAP2011] work, components for profiling can be carefully selected and assembled. It can also help the development of the deeper understanding and theory about how to link the data to background knowledge and thus make sense of the data.
89
Library
GeniUS [JIST2011] is a topic and user modeling software library that
• produces semantically meaningful profiles, to enhance the interoperability of profiles between applications;
• provides functionality for aggregating relevant information about a user from the Social Web;
• generates domain-specific user profiles according to the information needs of different applications;
• is flexible and extensible to serve different applications.
90
GeniUS: Generic Topic and User Modeling Library for the Social Semantic Web
Item Fetcher Enrichment Weighting
Function
RDF Repository
Filter
Modeling Configuration
RDF Serialization
Social Web
Semantic Web
user data items
enriched items
semantic data
user profiles
interested in:
location product
91
(a) hashtag-based (b) entity-based (c) topic-based
2. Profile Type
1. Temporal Constraints
3. Semantic Enrichment
4. Weighting Scheme
(a) time period (b) temporal patterns
(a) tweet-based (b) further enrichment
(a) concept frequency
User Modeling Building Blocks
92
User modeling with rich semantics: interested in:
people topics events … linkage user profile construction
#sport
person:Francesca_Schiavone
topic:Sports
event:FrenchOpen
topic:Tennis
time
weekday weekend
Profile types
• hashtag-based • topic-based • entity-based
enrichment • tweet-only • exploitation of external news resources
temporal patterns
• specific time period • temporal pattern • No constrains
User profile construction
95
1 10 100 1000user profiles
0
10
100
1000
10000
entit
ies
per u
ser p
rofil
e
News-basedTweet-based
1 10 100 1000user profiles
0
10
dist
inct
topi
cs p
er u
ser p
rofil
e
News-basedTweet-based
Entity-based profiles Topic-based profiles
profiles enriched with external news resource
profiles enriched with external news resource
By exploiting the linkage between tweets and news articles, we get more distinct entities / topics (semantics)!
Richer semantics through linking strategies.
Analysis of profile characteristics
96
Lessons
For profiles, we observed: • Semantic enrichment allows for richer user profiles. • Profiles change over time (hashtag-based more): fresh profiles seem to better reflect current user demands.
• Temporal patterns: weekend profiles differ significantly form weekday profiles (more than day/night).
For personalized news recommendation, we learned: • Best user modeling strategy:
Entity-based > topic-based > hashtag-based. • Semantic enrichment improves recommendation quality. • Adapting to temporal context helps for topic-based strategy.
98
Augment with what is there
Systems can use technology to augment their knowledge with data from the Social Web. Lessons learned show that for adaptive systems on the Social Web there is a lot of knowledge (easily) available, from other systems and other domains. Understanding how to leverage it, even to a basic level, can bring a lot.
100
Cross-system profiles
An example to show the added value of ‘cross-system’ on the Social Web is the work in [UMUAI 2013] where interweaving of public profiles is studied.
102
Google Profile URI h.p://google.com/profile/XY
4. enrich data with seman?cs
WordNet®
Seman'c Enhancement
Profile Alignment
3. Map profiles to target user model
FOAF vCard
Blog posts:
Bookmarks:
Other media:
Social networking profiles:
2. aggregate public profile
data
Social Web Aggregator
1. get other accounts of user
SocialGraph API
Account Mapping
Aggregated, enriched profile (e.g., in RDF or vCard)
Analysis and user modeling
5. generate user profiles
Interweaving public user data with Mypes
103
1. Characteristics of distributed tag-based profiles: • Overlap of tag-based profiles, which an individual user creates at
different services, is low • Aggregated profiles reveal significantly more information
(regarding entropy) than service-specific profiles
2. Performance of cross-system user modeling for cold-start recommendations: • Cross-system UM leads to tremendous (and significant)
improvements of the tag and bookmark recommendation quality • To optimize the performance one has to adapt the cross-system
strategies to the concrete application setting
http://persweb.org
Lessons
104
Location estimation
Another nice example follows from our work in the ImREAL project on augmentation (of adaptation) with the Social Web.
105
Improved location estimation by mixing Social Web streams
+ =
external data sources:
Enriching the image’s textual meta-data with the user’s tweets improves the accuracy of the location estimation.
106
Accuracy of social web metadata
This work has also raised attention for the accuracy of Social Web metadata. There are many reasons why this data cannot be taken as the universal truth. In application and domain specific contexts, we need to understand the accuracy of social metadata. Also, the work of [Rout et al. 2013] on location estimation based on social ties, shows the feasibility as well as the context-dependency.
108
LOD and cross-system
With these results in hand, in our [ICWE2012] work, we considered cross-system modeling with Linked Open Data. With the aim to understand how Linked Open Data background knowledge can be leveraged for cross-system and cross-domain augmentation.
109
Johannes Vermeer
dbpedia:Louvre Looking forward to visit Paris next week!
dbpedia:Paris
The lacemaker
The astronomer
Recommending Points of Interest
110
c1
c4
c5
c6
weigh'ng strategies
Applica'on that demands user
interest profile regarding -‐concepts
c2
c3 cx
cy
c9
User Profile concept weight
0.4
0.1
0.2
c1
c2
c3 … …
concepts that can be extracted from the user data
user data
Social Web
background knowledge (graph structures)
Linked Data
LOD-based User Modeling
111
tags: girl with pearl earring geo: The Hague
dbpedia:Girl_with_pearl_earring
A
Artifact
B
The lacemaker
C
The astronomer
…
rdf:type
Johannes Vermeer foaf:maker
foaf:maker
Strategies for exploiting the RDF-based background knowledge graph
dbpedia:The_Hague
dbpedia:Louvre dbpprop:location locatedIn
112
Lessons With LOD-based user modeling on the Social Web, different strategies for exploiting RDF-based background knowledge are possible. Findings: • Combination of different user data sources (Flickr & Twitter) is beneficial for the user modeling performance.
• User modeling quality increases the more background knowledge one considers.
• Combination of strategies achieves the best performance. To investigate further: dependency of strategies of entities and relationships, and temporal effects (eg temporal relationships or upcoming trends).
113
Interlinked online society
If you take a semantic technology perspective, then strong interlinking could be the direction to go. [Passant et al. 2009] studies applying semantic technologies to social media, creating a Web where data is socially created and maintained through end-user interactions, but is also machine-readable and therefore open towards sophisticated queries and large-scale information integration. "Social Semantic Information Spaces”, where any social data is a component in a worldwide collective intelligence ecosystem.
114
Origin of semantics
These social semantic spaces can trigger us in UMAP to articulate where we see the role and origin of semantics. Making all social data available ‘with semantics’ or observing that a lot of semantics is (only) effective in a specific domain or application? Experience showing the fine-grained nature of effects suggests the latter.
116
Humans & adaptive faceted search
An important element in the process of sense-making is its hybrid nature: humans involved in the sense-making. The control rooms have shown us that the human aspect in search is crucial, for judgment and interpretation. In our [ISWC2011] work, we looked at adaptive faceted search.
117
Adaptive faceted search framework
Adaptive Faceted Search
Twitter posts
Semantic Enrichment
User and Context Modeling
user
How to adapt the facet-value pair ranking to the
current demands of the user?
How to represent the content of a
tweet? facet extraction
118
Facet extraction and semantic enrichment
@bob: Julian Assange got arrested
Julian Assange
Julian Assange Tweet-based enrichment
Julian Assange arrested Julian Assange, the founder of WikiLeaks, is under arrest in London…
Link-based enrichment
Julian Assange
London
WikiLeaks
Julian Assange Julian Assange
London WikiLeaks
powered by
119
Impact of Link-based enrichment
Representation of tweets:
significantly more facets per tweet with link-based
enrichment
120
Faceted search strategies
Goal: most relevant facet-value pair should appear at the top of the ranking Faceted Search Strategies:
1. Occurrence frequency: count occurrence frequencies of FVP 2. Personalization: adapt ranking to user profile (eg user tweeting history) 3. Diversification: increase variety among the top-ranked FVPs 4. Time-sensitivity: adapt FVP ranking to temporal context
Semantic enrichment: (i) tweet-based and (ii) link-based enrichment
Locations 1. Aachen 2. Aalborg 3. Aalesund 4. Aarhus … 2145. Eindhoven
Locations 1. Eindhoven 2. Delft 3. Amsterdam 4. Rotterdam 5. London …
Link-based enrichment and occurrence-based and personalized rankings have large effect.
121
Twitcident.com Twitter-based crisis management system
1.
2.
3. 4.
Semantic enrichment allows for: 1. Grouping tweets
into incidents 2. Faceted search 3. Thematic Views 4. Analysis
122
Lessons
Semantic enrichment allows for structured representation of the content of tweets: a good basis for faceted search.
Faceted search performs significantly better than hashtag-based keyword search
Different building blocks for making faceted search on Twitter adaptive improve the search quality:
• Link-based enrichment: more discoverable tweets, better search performance.
• Personalization leads to significant improvements. • Time-sensitivity improves performance as well.
124
Duplicate detection
Important for reducing the volume of social data, is to categorize the social chatter and reduce redundancy in information. In our [WWW2013] work we have considered duplicate detection.
125
Twitter is more like a news media. How do people search on Twitter? [Teevan et al. 2011] has shown how this is characterized by repeated queries & monitoring for new content.
Problems:
• Short tweets è lots of similar information. • Few people produce contents è many retweets, copied content.
Search and retrieval on Twitter
126
Near-duplicates in Twitter search
Analysis of the Tweets2011 corpus (TREC microblog track) [WWW2013]
1.89%&
9.51%&
21.09%&
48.71%&
18.80%&
Exact©&
Nearly&exact©&
Strong&near;duplicate&
Weak&near;duplicate&
Low&overlapping&
• For the 49 topics (queries), 2,825 topic-tweet pairs are relevant.
• We manually labeled 55,362 tweet pairs
• We found 2,745 pairs of duplicates in different levels.
127
Twinder Framework Search infrastructure
Feature'Extrac+on'''''''
Relevance(Es+ma+on(
Social(Web(Streams(
Feature(Extra
c+on
(Task(
Broker(
Cloud Computing
Infrastructure
Index(
Keyword?based(Relevance(
messages
Twinder Search Engine
feature extraction
tasks
Search(User(Interface(
query results
feedback
users
Duplicate'Detec+on'and'Diversifica+on'
Seman+c?based(Relevance(
Seman+c(Features(Syntac+cal(Features(
Contextual(Features( Further(Enrichment(
128
Lessons Analyzing duplicate content in Twitter, we inferred a model for categorizing different levels of duplicity. We developed a near-duplicate detection framework for microposts and for categorizing duplicity of tweet pairs. Given the duplicate detection framework, we perform extensive evaluations and analyses of different duplicate detection strategies. Our approach enables search result diversification, also good to avoid ‘bubble effects’, and analyzes the impact of the diversification on the search quality. Follow Twinder progress: http://wis.ewi.tudelft.nl/twinder/
129
Take home from technology research
With semantics and humans, Social Web can help: • Semantics beneficial for filtering & search and enrichment & linking. • Semantic-enriched tweets beneficial for profiles and adaptation. • Social Web & Linked Data beneficial for cross-system augmentation. • Adaptive faceted search and duplicate detection beneficial for human-
enhanced processing. For adaptive systems that rely on profiling, Social Web is a fertile source for more knowledge. ImREAL research & experiences elegantly show principles, as well as the detailed work in domain & application:
• Social Web & LOD usage is context-specific. • Big Data in need of Small Interpretations.
130
APPLICATION
HUMANS FOR AUGMENTATION
USERS DOMAIN
DOMAIN Augmented with Web Semantics
USERS Augmented with Web Semantics
REAL DOMAIN
REAL USERS
131
Take home from technology research
The human intelligence is to be arranged differently: • We have moved from a priori understanding the system, to on the fly
understanding the system. • We have moved from careful manual analysis before, to machines doing the
analysis on the fly. • The critical and context-specific approach to (small) data, about domain
and users, is a part of process and system we now need to (re-)include. • This task of the designer has now shifted to a task for the human interpretation
inside the hybrid system: human monitoring inside.
135
In reality, not one truth
In the beginning, social systems like Twitter were used as ‘the’ semantic source of knowledge with an implicit assumption that Twitter is one voice. Over time, researchers have begun to investigate how to identify and interpret different voices and viewpoints in such a source. Differences in viewpoints and opinions are subject of study, but until now leverage is limited
136
Diversity and beliefs
[Flock et al. 2011] study the different backgrounds, mindsets and biases of Wikipedia contributors, to understand the effects - positive and negative – of this diversity on the quality of the Wikipedia content, and on the sustainability of the overall project. • Analysis and approach for diversity-minded content
management within Wikipedia. [Bhattachanya et al. 2012] estimate beliefs from posts made on social media, to monitor the level of belief, disbelief and doubt related to specific propositions.
137
Include the negative
Diversity of viewpoints and opinions also suggests to include negative links in the approach. [Symeonidis et al. 2010] give an example of how to include negative links into friend recommendation approaches, but this goes much further. The effect they observe on improving accuracy can be held as a principle where accuracy improvement can be gained using information about positive and negative edges.
138
ViewS
Modelling Viewpoints in User Generated Content
Text processing
Viewpoint extraction
(attention focus)
Ontology (activity aspects
to analyse) Semantic
enrichment Viewpoint
exploration
139
Viewpoints in YouTube Examples viewpoints in user comments on job interview videos
Comparing the viewpoints around ‘anger’ of young users (left) and old users (right)
141
Truth is not always truth
Just like this source of knowledge is not a single one, it is also clear that it might not be consisting of ‘true’ knowledge alone.
142
Malicious profiles
For example, profiles can be suspicious and made for the wrong reasons. In a context of online dating, [Pizzato et al. 2012] have observed the need to gain understanding of the sensitivity of recommender algorithms to scammers. With people being the items to recommend, fraudulent profiles can be having a serious impact on recommender algorithms. Identifying and detecting fraudulent profiles is a new challenge for us.
143
Identity theft
Another aspect to ‘wrong profiles’ relates to identity disambiguation and theft.
[Rowe et al. 2010] consider malevolent web practices such as identity theft and lateral surveillance. They study techniques for web users to identify all web resources which cite them and if necessary, remove the sensitive information.
144
Credibility of social content
The credibility of messages in social networks is for example studied in [Seth et al. 2010] on stories from Digg. Their model is based on theories developed in sociology, political science and information science. [Cramer et al. 2008] have nicely brought attention for trust. The study of social content credibility and trust are important, and ask for cross-discipline effort.
145
Privacy
A lot can be said about privacy in these networks, for example Facebook. [Bachrach et al. 2012] shows how users’ activity on Facebook (related to privacy) relates to their personality, as measured by the standard Five Factor Model. Nice example of understanding how Facebook features relate to interesting aspects of users and usage.
147
Cultural diversity
Studying diversity is not just relevant for understanding how Twitter content is to be interpreted. It is also relevant for understanding how the Social Web is used and can be used with a purpose. Cultural diversity is here one of the most interesting aspects and perhaps also one of the most challenging ones.
148
Cultural diversity
A subject addressed in ImREAL. Components are made available as services in ImREAL for augmented user modeling, e.g. for simulation designers.
150
Hofstede’s cultural dimensions
Describes stereotypical cultural characteristics of nationalities, with scores relative to other nationalities Five core dimensions:
• Individualism versus Collectivism (IDV) • Power Distance (PDI) • Masculinity versus Femininity (MAS) • Uncertainty Avoidance (UAI) • Long-Term Orientation (LTO)
geert-hofstede.com
151
Analysis
• Datasets • Microblog data collected over a period of three months • 22 million microposts from Sina Weibo and 24m from Twitter • a sample of 2616 Sina Weibo users and 1200 Twitter users
• Analyze and compare user behavior • on two levels (i) the entire user population and (ii) individual users • from different angles (i) syntactic, (ii) semantic, (iii) sentiment and (iv) temporal analysis
152
0% 20% 40% 60% 80% 100%users
0
0.01
0.1
1
avg
. num
ber o
f ha
shta
gs/U
RLs
per
pos
t
Hashtag-WeiboURL-WeiboHashtag-TwitterURL-Twitter
Hashtags and URLs are less frequently applied on Sina Weibo than on Twitter.
Users on Twitter are more triggered by hashtags and URLs when propagating information than on Sina Weibo.
Syntactic analysis
high collectivism in Weibo, a high individualism in Twitter
153
Semantic analysis
The topics that users discuss on Sina Weibo are to a large extent related to locations and persons. In contrast to Sina Weibo, users on Twitter are talking more about organizations (such as companies, political parties).
0% 20% 40% 60% 80% 100%users
0
0.001
0.01
0.1
1
10
avg.
num
ber o
f ent
ities
per
pos
t
WeiboTwitter
low employee commitment to an organization in China - high long term orientation.
154
Sentiment analysis
Sina Weibo users have a stronger tendency to publish positive messages than Twitter users.
0% 20% 40% 60% 80% 100%users
0%
20%
40%
60%
80%
100%
ratio
of p
ositv
e po
sts Weibo
more negative posts
more positive posts
high long term orientation.
155
Combined semantic sentiment analysis
The difference is amplified when discussing ‘people’ or ‘location’, with Sina Weibo users even more positive and Twitter users more negative.
more longterm orientation in Weibo, more shortterm orientation in Twitter
156
Temporal analysis
Twitter users repost messages faster than Sina Weibo users.
time distance = trepost - toriginal post
0% 20% 40% 60% 80% 100%users
0
0.1
1
10
100
1000
time
dist
ance
(in
hour
s) WeiboTwitter
large degree of power distance in Weibo, small one in Twitter
157
Cultural differences in tagging
Other work confirms the findings. And the consistency with theories of cultural differences between Asian and Western cultures. [Dong et al. 2011] look at cultural differences in a tagging system and find that American and Chinese subjects differed in many ways: • the number and types of tags they applied; • the extent to which they applied suggested tags or entered new tags of their own; and
• how often they applied tags that originated from a different culture.
158
Cultural variations for Social Q&A
Another example is given by [Yang et al. 2011] that looks at cultural differences in people’s social question asking behaviors across the United States, the United Kingdom, China, and India. They analyzed the questions people ask via social networking tools, and their motivations for asking and answering questions online. Results reveal culture as a consistently significant factor in predicting people’s social question and answer behavior.
160
Understand the source
When using the knowledge from Twitter as a semantic source, specially if it is the only semantic source, there are a few things one needs to consider that relate to the real-time nature of social contributions. The ‘knowledge’ is not unambiguous: inconsistency, moods, etc. Real-time knowledge spreads and evolves fast.
161
Inconsistency & moods
Twitter is used as semantic sensor, sometimes as the only semantic sensor, but consistency in user contributions like ratings is a concern. [Said et al. 2012] shows how users are inconsistent in their ratings and tend to be more consistent for above average ratings. [De Choudhury et al. 2012] report on the relation between moods and social activity, social relations and participatory patterns like link sharing and conversational engagement.
162
Understanding over time
While Twitter and the like were used in the beginning as ‘fixed’ sources of knowledge, researchers have become interested in the evolution over time. The nature and speed of the flow of content over time have become great objects of study. Two domains that in this light have received fair attention is that of diseases and (political) news.
163
Flow in disease information
Domain of diseases and outbreaks is getting fair attention. Works by [Gomide et al. 2011] on Dengue and [Diaz-Aviles et al. 2012] on EHEC, show how the people’s behavior on Twitter can be used for surveillance and tasks such as early warning and outbreak investigation.
164
Flow of news
From [Naveed et al. 2011] we learn how retweets reflect what the Twitter community considers interesting on a global scale. In [Backstrom et al. 2011] we see the differences between communication and observation in Facebook: communication involves a much higher focus of attention than observation activities. We see in [Lerman et al. 2010] how network structure affects dynamics of how interest in news stories spreads among social networks in Digg and Twitter
165
Flow in political news
Coming back to our observation of the multiple truths, political news is a great domain to look at. For the contact of political speech, [Metaxas et al. 2010] discuss how the real-time nature of Twitter provides disproportionate exposure to personal opinions, fabricated content, unverified events, lies and misrepresentations, with viral spread as a consequence. To act upon that, [Lumezanu et al. 2012] identify extreme tweeting patterns that could characterize users who spread propaganda (political propagandists), e.g. sending high volumes of near-duplicate messages.
166
Temporal effects
In our [WebSci2011] work, we have considered how user interests are manifest over time. Most users, who are interested into the news topic, become interested within a few days. Lifespan of users’ interest: • Long-term adopters - continuously interested • Short-term adopters - interested only for a short period in time (and influenced by “global trends”)
High overlap between early adopters and long-term adopters.
167
Temporal effects
On Twitter the importance of entities for a topic varies over time (long-term vs. short-term entities). In terms of user interests over time, the majority of users becomes quickly (few days) interested in a topic. When using Twitter-based profiles for personalization, time-sensitive user modeling improves recommendation quality. Also, the selection of user modeling strategy should take the type of user into account: • Long-term adopters: hashtag-based • Short-term adopters: entity-based
168
Twitter-based Trend and User Modeling Framework
Twitter posts
current tweets
of Twitter
community
news recommender ?
Profile Semantic
Enrichment
Profile Type
Aggregation
Weighting Scheme
trends
time
user’s interests
169
Temporal effects with trends
For the domain of personalized news recommendations, We have combined trend and user modeling in our framework. • We have seen how user profiles change over time, under the influence of trends.
• Appropriate concept weighting strategies allow for the discovery of local trends.
• Time sensitive weighting function is best for generating trend profiles.
Aggregation of trend and user profile can improve the performance of recommendations.
171
Check with the user
With all profiles based on augmentation, it becomes (even more) vital to follow the lessons of checking with the user. By engaging with the user in a common process of validating the profile and the assumptions based on it.
172
Perico
Dialogue for Modelling Cultural Exposure using Linked Data
Initial User Model
• Visited Countries • Estimated Cultural
Exposure
Social Web
Sensors
Perico Dialogue Agent
Cultural Fact Extractor
Quiz Generator
User Profile Generator Dialogue Planner
Updated User Model
• Verified Visited Countries
• Enhanced Cultural Exposure Score
173
Perico
Dialogue for Modelling Cultural Exposure using Linked Data
Initial User Model
• Visited Countries • Estimated Cultural
Exposure
Social Web
Sensors
Perico Dialogue Agent
Cultural Fact Extractor
Quiz Generator
User Profile Generator Dialogue Planner
Updated User Model
• Verified Visited Countries
• Enhanced Cultural Exposure Score
174
Inspect and control
[Knijnenburg et al. 2012] consider how users of social recommender systems may want to inspect and control how their social relationships influence the recommendations they receive: friends are not always “nearest neighbors”. The results show that high inspectability and control indeed increase users’ perceived understanding of and control over the system, their rating of the recommendation quality, and their satisfaction with the system, and thus an overall better user experience.
176
Understanding communities
Attention is given to communities and their dynamics. [Chan et al. 2010] proposes a method for analysing user communication roles in discussion forums.
[Schwagereit et al. 2011] study governance in web communities.
[Karnstedt et al. 2011] considers the relation between a user's value within a community - constituted from various user features - and the probability of a user churning.
[Yang et al. 2010] analyze users’ activity lifespan in online knowledge sharing communities: acknowledgement of contributions leads to user survival.
177
Involvement in communities
In order to understand how people behave in Social Web and in communities, it is relevant to understand their engagement and involvement in more detail. [Lehmann et al. 2012] study how users engage with online services, and how to measure this engagement. [Freyne et al. 2009] look at how social networking sites rely on the contribution and participation of their members: focus on early interventions for engagement.
178
Communities and expertise
Understanding communities is also relevant as these communities can act as additional resource. From finding evidence for profiles, we have seen recent attention shift towards finding people and expertise. For example, to enable active engagement of people. For using expertise in UMAP, it is also important to be able to specify expertise, to enable reasoning about the expertise’s quality and fit.
179
Take home from challenges
The (Social) Web tells many stories: • Acknowledge multiple truths, opposing truths, and bad intentions. • Acknowledge multiple audiences and viewpoints. • Acknowledge cultural variations.
The (Social) Web moves fast:
• Acknowledge the real-time nature of Web and applications. • Analyze and understand the flow of information. • Analyze and understand the nature of communities.
The (Social) Web includes people:
• Involve the users actively in validation. • Involve (communities of) users in interpretation.
182
Social & UMAP
Huge economic and societal potential for added value. Social Web is a fertile source of knowledge for augmentation.
• Semantics can be beneficial for social-based augmentation.
• Hybrid, human-enhanced approaches can be beneficial.
• Technological feasibility of augmentation.
Research from specific cases towards general theory. Next on the agenda:
• Describe added value for stakeholders, describe goals.
• Share and compare research challenges and evaluations.
183
Web & UMAP
UMAP systems are Web systems: • The (Social) Web tells many stories. • The (Social) Web moves fast. • The (Social) Web includes people.
The Web is the real laboratory for UMAP systems. Next on the agenda:
• Share and compare solutions, components, and systems. • Support more uniformity in methods and practices.
184
UMAP & Web
On the (Social) Web, systems are being made: • Take positions or prepare to take positions about bad
intentions. • Take responsibility and recommend about future
architectures. On the (Social) Web, many systems are small:
• Do (also) consider the specific problems of small and medium sized stakeholders: bring UMAP into practice.
185
UMAP & Social
In SWUMAP, human intelligence is arranged differently:
• From careful manual analysis a priori, to machine analysis on the fly.
• Critical and context-specific approach to data is part of the ‘in vivo’ system.
• Human interpretation of data is inside the hybrid system.
It makes for a new type of system, and one of great value. And plenty of fun and diverse challenges for UMAP.
186
APPLICATION
HUMANS FOR AUGMENTATION
USERS DOMAIN
DOMAIN Augmented with Web Semantics
USERS Augmented with Web Semantics
REAL DOMAIN
REAL USERS
187
APPLICATION
HUMANS FOR AUGMENTATION
USERS DOMAIN
DOMAIN Augmented with Web Semantics
USERS Augmented with Web Semantics
SWUMAP