+ All Categories
Home > Technology > Involuntary Information Leakage in Social Network Services

Involuntary Information Leakage in Social Network Services

Date post: 20-Jun-2015
Category:
Upload: academia-sinica
View: 993 times
Download: 3 times
Share this document with a friend
Description:
Disclosing personal information in online social network services is a double-edged sword. Information exposure is usually a plus,even a must, if people want to participate in social communities; however, leakage of personal information, especially one's identity, may invite malicious attacks from the real world and cyberspace, such as stalking, reputation slander, personalized spamming and phishing. Even if people do not reveal their personal information online, others may do so. In this paper, we consider the problem of involuntary information leakage in social network services and demonstrate its seriousness with a case study of Wretch, the biggest social network site in Taiwan. Wretch allows users to annotate their friends' profiles with a one-line description, from which a friend's private information, such as real name, age, and school attendance records, may be inferred without the information owner's knowledge. Our analysis results show that users' efforts to protect their privacy cannot prevent their personal information from being revealed online. In 592; 548 effective profiles that we collected, the first name of 72% of the accounts and the full name of 30% of the accounts could be easily inferred by using a number of heuristics. The age of 15% of the account holders and at least one school attended by 42% of the holders could also be inferred. We discuss several potential means of mitigating the identified involuntary information leakage problem.
Popular Tags:
46
2009/2/2 Ieng-Fat Lam, Kuan-Ta Chen, and Ling-Jyh Chen Institute of Information Science, Academia Sinica Presenter: Ieng-Fat Lam Involuntary Information Leakage in Social Network Services Involuntary Information Leakage in Social Network Services
Transcript
Page 1: Involuntary Information Leakage in Social Network Services

2009/2/2

Ieng-Fat Lam, Kuan-Ta Chen, and Ling-Jyh ChenInstitute of Information Science, Academia Sinica

Presenter: Ieng-Fat Lam

Involuntary Information Leakage inSocial Network Services

Involuntary Information Leakage inSocial Network Services

Page 2: Involuntary Information Leakage in Social Network Services

2

OutlineOutline

Introduction

Motivation

Research Method

Results

Discussion

Conclusion

Page 3: Involuntary Information Leakage in Social Network Services

3

Social Networking Services (SNSs)Social Networking Services (SNSs)

For example• Myspace, Facebook, Orkut, Yahoo! 360

• Mixi, GREE (Japan)

• Wretch (Taiwan)

Become very popular

Hosts millions of profiles

Introduction ::

Page 4: Involuntary Information Leakage in Social Network Services

4

Users in SNSsUsers in SNSs

Social Activities• Meet new friends, contact existing friends

• Share resources over the Internet

Personal Information is usually published• Photos

• identity information

• Contact information

Introduction ::

Page 5: Involuntary Information Leakage in Social Network Services

5

Disclosing personal informationDisclosing personal information

Double‐edged sword • Let other people know / search you

• But some people may not respond nicely

• Risk of personal information used by malicious people

I am Lee-Da Nu!I love movieI am 23 years old, single!!

I am Lee-Da Nu!I love movieI am 23 years old, single!!

Introduction ::

Page 6: Involuntary Information Leakage in Social Network Services

6

Not revealing person information?Not revealing person information?

I never disclose my info to the Internet!

I never disclose my info to the Internet!

Introduction ::

Page 7: Involuntary Information Leakage in Social Network Services

7

Information revealed by friendsInformation revealed by friendsIntroduction ::

Page 8: Involuntary Information Leakage in Social Network Services

8

Information revealedInformation revealedIntroduction ::

[I got it!]Real Name : Andrew RichmanGender:MaleAge: 20 ~ 22Education record:Sunrise elementary schoolSt. John secondary schoolSt. Paul University

[I got it!]Real Name : Andrew RichmanGender:MaleAge: 20 ~ 22Education record:Sunrise elementary schoolSt. John secondary schoolSt. Paul University

Page 9: Involuntary Information Leakage in Social Network Services

9

Involuntary Information leakageInvoluntary Information leakage

A User may want to protect his/her identity• But it may unintentionally revealed by friends

• Hard to detect such leakageDue to distributed nature of Internet

• Becoming a serious threat to privacy

Motivation ::

Page 10: Involuntary Information Leakage in Social Network Services

10

In this studyIn this study

We would like to • Investigate the extent of involuntary information leakage

• Gather data from Wretch (http://www.wretch.cc)The most popular SNS in Taiwan

About 4 millions user profiles

• Quantify the degree of such leakageReal Name, age and education record

• Discuss potential means to mitigate the problem

Motivation ::

Page 11: Involuntary Information Leakage in Social Network Services

11

Data CollectionData Collection

User ID List (Crawl)john123Aronroseroseiamboy…..

User ID List (Crawl)john123Aronroseroseiamboy…..

1. Pick ID randomly

2. Obtain user profileand friend list (HTML)

AndyOrange…

Frn List

4. Add user IDTo ID list

3. Parse and save crawled user data

5. UpdateID List

Research Method ::

AndyOrange…

Frn List

Page 12: Involuntary Information Leakage in Social Network Services

12

An exampleAn example

User ProfileUser Profile

Friend listFriend list

Research Method ::

Page 13: Involuntary Information Leakage in Social Network Services

13

Overview of Crawled DataOverview of Crawled Data

Wretch Data

Number of users 766,972 (20%)

Number of Effective users 592,548 (15%)

Number of Connections 7,619,212

Avg. Connections per user 11.5

*Effective user at least have one “outgoing” friend connection

Research Method ::

Page 14: Involuntary Information Leakage in Social Network Services

14

Analysis of Name LeakageAnalysis of Name Leakage

Friend annotations in Wretch• A free‐form text to describe a friend

• It is used forClassification

Real name or nickname of a friend

The feature of a friend

For example• *Beauty Cathy Brown – The hottest girl of Nightingale High School

• [[ School Mate ]] Tony MY BUDDY

Research Method ::

Page 15: Involuntary Information Leakage in Social Network Services

15

Name Inference ProcessName Inference ProcessResearch Method ::

1. Obtain friend annotations

(for each profile)

2. GenerateName Candidates

Infer First Name

Page 16: Involuntary Information Leakage in Social Network Services

16

Generate name candidatesGenerate name candidates

To  infer real name of a profile• Collect all of its incoming annotations

• Extract name candidates from annotations

Research Method ::

Andy

Aron

Andrew!!Andrew!!

Yo~ Bros. Andrew!!Yo~ Bros. Andrew!!

Sammy

Old Mr. Richman!!Old Mr. Richman!!

Cool~~ Andrew Richman!!Cool~~ Andrew Richman!!

Page 17: Involuntary Information Leakage in Social Network Services

17

Generate name candidates (cont.)Generate name candidates (cont.)

Extract method• Break the text into tokens by

Symbols: <space>, <tab>, ‘#’, ‘@’, etc.Punctuation marks: ‘ ” , . () []Connective words (in Chinese)

• Chinese‐specific naming rules陳寬達 (Chen Kuan‐Ta)Two‐word tokens as first name candidatesThree‐word tokens as full name candidates

• Duplication Count is associated

Research Method ::

Page 18: Involuntary Information Leakage in Social Network Services

18

An exampleAn example

Andy

Andrew!!德榮!!Andrew!!德榮!!

Yo~ Andrew~Bros Andrew!!喔~德榮~德榮兄!!Yo~ Andrew~Bros Andrew!!喔~德榮~德榮兄!!

Old Mr. Richman~!!老劉~!!Old Mr. Richman~!!老劉~!!

Cool~~ Andrew Richman!!超帥~~ 劉德榮!!Cool~~ Andrew Richman!!超帥~~ 劉德榮!!

Name Candidates

德榮 (Andrew) [1]超帥 (Cool) [0]劉德榮 (Andrew Richman) [0]德榮兄 (Bros Andrew) [0]喔 (Yo) [0]老劉 (Old Mr. Richman) [0]

Name Candidates

德榮 (Andrew) [1]超帥 (Cool) [0]劉德榮 (Andrew Richman) [0]德榮兄 (Bros Andrew) [0]喔 (Yo) [0]老劉 (Old Mr. Richman) [0]

Research Method ::

Full name candidates

First name candidates

Duplication count

Full name candidates

First name candidates

Duplication count

Page 19: Involuntary Information Leakage in Social Network Services

19

Inference of full name (1 / 5)Inference of full name (1 / 5)

Common family name• Family name part is a common family name

• Duplication count is greater than 1

• For exampleFor full name candidate “Andrew Richman”

If “Andrew Richman” exists in more than 1 annotations

If “Richman” is a common family name

Research Method ::

[1] Chih-Hao Tsai, “Common Chinese Names”, http://technology.chtsai.org/namefreq/

Page 20: Involuntary Information Leakage in Social Network Services

20

Inference of full name (2 / 5)Inference of full name (2 / 5)

First name as a substring of full name• A first name candidate as a substring

In the right position

• Duplication count is greater than 1

• For exampleFor full name candidate “Andrew Richman”

If “Andrew Richman” exists in more than 1 annotations

If “Andrew” is also a first name candidate

Research Method ::

Page 21: Involuntary Information Leakage in Social Network Services

21

Inference of full name (3 / 5)Inference of full name (3 / 5)

Common full name• Compare with existing  full name list

• National college exam enrollment listList maintained from 1994 to 2007

574, 010 distinguished full names

Research Method ::

[2] Chih-Hao Tsai, “A list of Chinese Names”, http://technology.chtsai.org/namelist/

Page 22: Involuntary Information Leakage in Social Network Services

22

Inference of full name (4 / 5)Inference of full name (4 / 5)

Nickname decomposition• In Chinese name

FN GN1‐GN2 (陳寬達)

• Possible nicknames:Prefix + X

Prefix + X + X

X + postfix

Where X can be FN, GN1 or GN2

Research Method ::

For “Andrew Richman”

We also have “Bros Andrew”

“Bros” is a predefined prefix

Removed “Bros” we got “Andrew”

“Andrew” is in “Andrew Richman”

For “Andrew Richman”

We also have “Bros Andrew”

“Bros” is a predefined prefix

Removed “Bros” we got “Andrew”

“Andrew” is in “Andrew Richman”

Page 23: Involuntary Information Leakage in Social Network Services

23

Inference of full name (5 / 5)Inference of full name (5 / 5)

Common words removal• If no match candidates found in above rules

• If duplicate count greater than 1

• If the full name candidate is not a nicknameDoes not contain any nickname prefix or postfix

• Not a ( or based on a ) common wordCompare to 100,511 common words

• Select the one with the highest duplication count

Research Method ::

Page 24: Involuntary Information Leakage in Social Network Services

24

Inference of First NameInference of First Name

Use same method as inference of full name• Common first name

Compare with 208,581 first names

Required duplication count greater than 1

• Nickname decomposition

• Common word removal

Research Method ::

Page 25: Involuntary Information Leakage in Social Network Services

25

Name Inference ResultsName Inference Results

Ratio of inferred names

Type  of name Ratio of name inference

Nickname 60%

Real name (full name) 30%

First name 72%

Real name or first name 78%

Results ::

Page 26: Involuntary Information Leakage in Social Network Services

26

ValidationValidation

Examine real name by manual• Randomly Select 1,000 profiles

• 738 of them are unique and correctMore examine is performed, similar result

• Wrong case: User’s nickname

• Sufficient to support the conjectureInvoluntary real name leakage occurs in real‐life social network systems, and the degree of leakage is significant

Results ::

Page 27: Involuntary Information Leakage in Social Network Services

27

Ratio of Name LeakageRatio of Name Leakage

Figure 2: Ratio of name leakage based on users’ gender

Figure 3: Relation of users’ age and ratio of name leakage

Results ::

Page 28: Involuntary Information Leakage in Social Network Services

28

Risk AnalysisRisk Analysis

To confirm the identity leakage is involuntary• We check the inferred name with user’s profile

Only less than 0.1% users reveal their real names

To quantify the tendency of using real name• Degree of Using Real name (DUR)

Ratio of a user’s outgoing annotation that contain real name of annotation target

• Degree of being Called by Real name (DCR)Ratio of incoming annotations containing user’s real name

Results ::

Page 29: Involuntary Information Leakage in Social Network Services

29

Example of DUR and DCRExample of DUR and DCR

DUR and DCR

“Andrew”

[Friend] Raymond Aron

Our King!

[Friend]John Lennon

Yo~What’sup man

[Friend] Jay leno

[Friend]David Jones

Cool~Andrew Richman

[Friend]Sammy Hagar

Bros Andrew

Criteria DUR

First name 4/5

Full name 1/5

Either 5/5

Criteria DCR

First name 1/5

Full name 1/5

Either 2/5

Results ::

Page 30: Involuntary Information Leakage in Social Network Services

30

Positive relation between DUR and DCRPositive relation between DUR and DCR

Figure 4: Relation of DUR and DCR

Results ::

Page 31: Involuntary Information Leakage in Social Network Services

31

Involuntary leakage of age and education records

Involuntary leakage of age and education records

Inferring age• Round‐based manner

• If X disclosed age, and have a friend Y

• If X and Y have relation of “classmate”, “same class”…

• Assign age of X to Y

• Then check Y’s  “classmate”

Research Method ::

Page 32: Involuntary Information Leakage in Social Network Services

32

Involuntary leakage of age and education records

Involuntary leakage of age and education records

Inferring Education records• Same as inferring age

• Divided into four education level, infer separatelyElementary School

Junior high school

Senior high school

College

• Define relation by keyword “same school”, “same college”, etc.

Research Method ::

Page 33: Involuntary Information Leakage in Social Network Services

33

Inference resultsInference results

Figure 5: Inference results of users' ages

Results ::

Figure 6: Inference results of users' education records

Page 34: Involuntary Information Leakage in Social Network Services

34

ValidationValidation

Cross‐validation• Verify inferred ages 

Based on self‐disclosed education records

• Verify inferred education recordsBased on self‐disclosed ages

• Difference of age should be smallTo verify our infer result are accurate

Results ::

Page 35: Involuntary Information Leakage in Social Network Services

35

Validation ResultsValidation ResultsResults ::

Figure 7: The inferred age differences between pairs of self-disclosed

schoolmates in the four education levels

Figure 8: The self-disclosed age differences between pairs of inferred

schoolmates in the four education levels

Page 36: Involuntary Information Leakage in Social Network Services

36

Threads caused by identity leakageThreads caused by identity leakage

StalkingSpamming• In our data set 

46% users disclosed valid email addressSpam with friends’ (spoofed) email address

Phishing• Spear phishing / Social phishing

Includes personal information in phishing emailSpoof friend’s email address

Discussion ::

Page 37: Involuntary Information Leakage in Social Network Services

37

Spear Phishing or SpamSpear Phishing or Spam

Dear Mr. Richman, We are eBay customer service, we concern about your security, please update your personal information.

Dear Mr. Richman, We are eBay customer service, we concern about your security, please update your personal information.

Dear Mr. Andrew RichmanYou win 100,000,000 USD!!Which from lottery of St. Paul University fund.

Dear Mr. Andrew RichmanYou win 100,000,000 USD!!Which from lottery of St. Paul University fund.

Discussion ::

Page 38: Involuntary Information Leakage in Social Network Services

38

Social Phishing or SpamSocial Phishing or Spam

Hay, Andrew, I am Sammy, I recommend you a cool site!!http://spam.com

Hay, Andrew, I am Sammy, I recommend you a cool site!!http://spam.com

Bros, I am David!St. Paul University student association have a party on next month, you need to transfer the registration fee ASAP, see you there.

Bros, I am David!St. Paul University student association have a party on next month, you need to transfer the registration fee ASAP, see you there.

[email protected]

[email protected]

Discussion ::

Page 39: Involuntary Information Leakage in Social Network Services

39

Potential SolutionsPotential Solutions

Three possible ways to mitigate the problemA. Personal privacy settings

B. Browsing scope settings

C. Owner’s confirmation

D. Applying Disclosure Control of Natural Language information (DNCL)‧ Proposed by Haruno Kataoka et al.

Discussion ::

Page 40: Involuntary Information Leakage in Social Network Services

40

Personal Privacy SettingsPersonal Privacy Settings

1. Hide personal information

2. Hide social connections (in level)

3. Deny annotations using certain words

4. Limit specific users to access friend relations or annotations

Don’t call my real name, call me 007!

Don’t call my real name, call me 007!

ProfileProfile

Discussion ::

Page 41: Involuntary Information Leakage in Social Network Services

41

Browsing Scope SettingsBrowsing Scope Settings

Prevent large scale download of user profiles• Includes Third‐party API

Limit browsing scope• Group partitioning / “invitation letter” mechanism

Malicious man

Discussion ::

Page 42: Involuntary Information Leakage in Social Network Services

42

Owner’s ConfirmationOwner’s Confirmation

Every operation related to friend relation

At least prevent unintentional personal information leakage

I want to use “Cool Andrew Richman”, may I ?

Sure!!!

Malicious man

Hay Mr. Richman, you are the lucky winner!Hay Mr. Richman, you are the lucky winner!

My name is public, everyone knows me!!

My name is public, everyone knows me!!

Discussion ::

Page 43: Involuntary Information Leakage in Social Network Services

43

Applying DNCL (Haruno Kataoka et al.)Applying DNCL (Haruno Kataoka et al.)

Ideal way to preserve • Search ability

• Availability

• Connected

• While no sensitive information is disclosed

• Rather than “Insecure” or “Un‐enjoyable”

Implementation is expected• Different language support is the best

Discussion ::

Page 44: Involuntary Information Leakage in Social Network Services

44

ConclusionConclusion

We quantify the extent of name leakage • Using Wretch data set

• 78% of users suffer from risk of involuntary name leakage

• Users’ age and education records are also in riskReason by friends’ disclosed information

Beware of Internet scams and phishing

Conclusion ::

Page 45: Involuntary Information Leakage in Social Network Services

45

Questions?Questions?Thank you! ::

Page 46: Involuntary Information Leakage in Social Network Services

46

Ratio of self-disclosureRatio of self-disclosureResearch Method ::

Figure 1: Ratio of Self-disclosure


Recommended