© Copyright IBM Corporation 2010
IBM Research
On the Quality of Inferring Interests From Social Neighbors
Zhen Wen, Ching-Yung Lin
IBM T. J. Watson Research Center
|
IBM Research
© Copyright IBM Corporation 2010
Motivation
Modeling user interests enables personalized services– More relevant search/recommendation results
– More targeted advertising
Data about users are sparse– Many user profiles are static, incomplete and/or outdated
– <10% employees actively participate social software [Brzozowski2009]
Inferring user interests from neighbors can be a solution– Also bring up a concern of exposing user’s private information
How true are “You are who you know”, “Birds of a Feather Flocks
Together”?
|
IBM Research
© Copyright IBM Corporation 2010
Challenges in Observing Users
Diverse types of media
– Public social media (friending, blogs, etc.)
Data are public but limited (esp. in enterprises)
– Private communication media (email, instant messaging, face-to-face meetings, etc)
Much more data
Privacy is a major issue
|
IBM Research
© Copyright IBM Corporation 2010
Example of Diverse Types of Media
Number of people participated in top 3 media in an Enterprise with 400K employees
Number of entries:• Social bookmarking: 400K• Electronic communication: 20M• File sharing: 140K
|
IBM Research
© Copyright IBM Corporation 2010
Our Goals
How well a user’s interests can be inferred from his/her social neighbors?
Can the diverse types of media be combined to improve inferring user interests from social neighbors?
Can the quality of the inference be predicted based of features of social neighbors?
– Only sufficiently accurate inference may help personalized services
|
IBM Research
© Copyright IBM Corporation 2010
Our Approach
Infer user interests from social neighbors
– Model user interests based on multiple types of information they accessed
– Construct employee social network from communication data
– Infer using social influence model
Study the relationship between inference quality and network characteristics
– Identify effective factors to ensure high quality results for applications
|
IBM Research
© Copyright IBM Corporation 2010
SmallBlue: Unlock the Power of Business Networks & Protect Privacy
Expertise: Search for people who know “xyz” in my networks..
Ego: Show my personal network evolution and social capital
Net: See how experts or community connect
Reach: helps me to understand this person and my formal and information paths to Reach him..
Whisper: Social Network enabled personalized live recommender..
Productivity: Social Network Analysis Service helps company understand how to enhance productivity.
Synergy: Personalized Search
crawlingDistributed
Streams
DBs &
Feeds
20,000,000 emails & SameTime messages
1,000,000 Learning click data
14,000,000 KnowledgeView, SalesOne, …, access data
1,000,000 Lotus Connections (blogs, flie sharing, bookmark) data
200,000 people’s consulting financial databases
400,000 IBMers organization/demographic data
400,000 webpages and knowledge assets
Social Network Analysis & Visualization, Expertise Mining,
and Multi-Channel Human Network/Behavior Analysis
Live Data
|
IBM Research
© Copyright IBM Corporation 2010
Privacy as Fundamental Human Rights and Global Privacy Laws
(United Nations) Universal Declaration of Human Rights [1948]
Article 12: No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honor and reputation. Everyone has the right to the protection of the law against such inference or attacks.
European Union• European Data Protection Directive (1995)
Canada• PIPEDA
(2001 - 2004)
U.S. – Sectoral• Children ’ s Privacy; COPPA (1999)
• Financial Sector GLB (2001)
• Health Sector; HIPAA (2002)
• California Privacy; (2005) Taiwan• Computer - Processed PD Protection Law (1995)
South Korea• Info & Comm Network Util. & Info Protection Law (2000)
Japan• Personal Data Protection Act (2005)
APEC• Guidelines (2004)
Existing Private SectorPrivacy Laws
Emerging Private SectorPrivacy Laws
Existing Private SectorPrivacy Laws
Emerging Private SectorPrivacy Laws
APEC• Guidelines (2004)
Russia• Federal law on Pers Data
(January 2007)
Australia• Privacy Amendment Act (2001)
New Zealand• Privacy Act (1993)
Chile• Protection of Private Life Law (1999)
Argentina• Protection of PD Law (2000)
Dubai• Data Protection Law
(January 2007)
EU Directive 95/46/EC Article 2 (a):
– Personal data shall mean any information relating to an identified or identifiable natural person
EU Directive 95/46/EC Article 7:
– Personal data may be processed only if:
The data subject has unambiguously given his consent; or
for the performance of a contract to which the data subject is party or in order to take steps at the request of the data subject prior to entering into a contract; or
for compliance with a legal obligation to which the controller is subject; or…
|
IBM Research
© Copyright IBM Corporation 2010
Dataset
25315 users’ contributed content
– 20M email/chats
– 400K social bookmarks
– 20K shared public files
– Profile information
Job role, division, news categories of interests, etc
Infer social network based on email/chats
X’: number of emails
|
IBM Research
© Copyright IBM Corporation 2010
User Interests Model – Implicit Interests
Model users’ interests implicitly indicated by their contributed content
– Extract latent topics from the multiple types of content using LDA
– Select top-N distinct topics as the implicit interests model of a user
The degree the user is interested
The similarity of topics
|
IBM Research
© Copyright IBM Corporation 2010
User Interests Model – Explicit Interests
29% users manually specify interests in their profile
– A list of selected terms
From a static 1120-term taxonomy related to work
Compare implicit and explicit interests
– Explicit interests models are more limited
Implicit interests cover 60.4% explicit interests
Explicit interests cover 2.2% implicit interests
|
IBM Research
© Copyright IBM Corporation 2010
Infer Interests Based on Social Influence
Social influence model
– Network autocorrelation model [Leenders02]
Social influence represented as a weighted combination of neighbors’ attributes
The weight is an exponential function of the social distance
|
IBM Research
© Copyright IBM Corporation 2010
Inference Quality
Condition Max Mean St. Deviation
Using social bookmark data only 59.4% 19.2% 10.7%
Using file sharing data only 44.9% 12.7% 7.2%
Using email/IM data only 62.1% 29.6% 14.1%
Using all three data 100% 45.1% 21.7%
Implicit interests: how close the inferred top-20 topics to the ground truth
– Significant advantage in combining multiple sources
– Large variance can affect practical application, thus need predict when to infer interests
– Much better recall than precision
Explicit interests: precision and recall of inferred terms
Measure Mean St. Deviation
Precision 30.1% 26.9%
Recall 61.5% 27.6%
|
IBM Research
© Copyright IBM Corporation 2010
Can Inference Quality be Predicted?
Hypothesis: inference quality can be predicted from social network properties
– User activeness: the amount of contribution
– In-degree
– Out-degree
– Betweenness
– User management role
Use Support Vector Regression to perform prediction
Evaluate prediction
– Precision/recall of the prediction (10-fold cross validation)
– Use prediction to improve inference Only infer when we predict it’s high quality
|
IBM Research
© Copyright IBM Corporation 2010
Quality Prediction Results
Precision/recall of prediction
Improve inference
Measure Improved toImprovement
(%)
Precision 60.5% 101%
Recall 85.7% 39.3%
Implicit Interests
Implicit Interests
Explicit Interests
Explicit Interests
|
IBM Research
© Copyright IBM Corporation 2010
Feature Comparison
“Leave-one-feature-out" comparisons of prediction results
Most social influences are from 1&2-degree
neighbors
You neighbors decide how well you can be
inferred
You neighbors’ network positions may be even more important than how active they are
– Formal organizational properties
Manager neighbors are more important in inference
– i.e., more social influence (about 5% more)
|
IBM Research
© Copyright IBM Corporation 2010
Related Work
User modeling
– Use behavioral data of the Ego
[Shepitsen08, Song05, Stoyanovic08, Teevan05]
– Use data of 1-degree neighbors
Issued the same query ([Piwowarski07, White09])
Collaborative filtering ([Goldberg92])
Social influence and correlation
– Correlation and related factors in social networks
[Singla08,Blei03, Crandall08, Anagnostopoulos08, Tang09]
– Infer user profiles in online communities
[Mislove2010]
|
IBM Research
© Copyright IBM Corporation 2010
Conclusion
There’s large variance in the quality of inferring user interests from social neighbors
The “recall” of the inference is much better than “precision”
The inference quality can be predicted from social network properties
|
IBM Research
© Copyright IBM Corporation 2010
Questions?