+ All Categories
Home > Documents > Exploring Linkability of User Reviews

Exploring Linkability of User Reviews

Date post: 24-Feb-2016
Category:
Upload: reba
View: 56 times
Download: 0 times
Share this document with a friend
Description:
Exploring Linkability of User Reviews . Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine m almisha,[email protected]. Increasing P opularity of Reviewing Sites Yelp, more than 39M visitors and 15M reviews in 2010. category. Rating. - PowerPoint PPT Presentation
Popular Tags:
50
Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine malmisha,[email protected]
Transcript
Page 1: Exploring  Linkability of User Reviews

Exploring Linkability of User Reviews

Mishari Almishari and Gene Tsudik Computer Science DepartmentUniversity of California, Irvine

malmisha,[email protected]

Page 2: Exploring  Linkability of User Reviews

Increasing Popularity of Reviewing Sites Yelp, more than 39M visitors and 15M reviews

in 2010

Page 3: Exploring  Linkability of User Reviews

categoryRating

Page 4: Exploring  Linkability of User Reviews

Rising Awareness of Privacy

Page 5: Exploring  Linkability of User Reviews

How Privacy apply to Reviews?

TraceabilityLinkability of Ad hoc ReviewsLinkablility of Several Accounts

Page 6: Exploring  Linkability of User Reviews

Contribution

Extensive Study to Measure privacy/linakability in user reviews

Propose models that adequately identify authors

Page 7: Exploring  Linkability of User Reviews

Settings & Problem Formulation

Page 8: Exploring  Linkability of User Reviews
Page 9: Exploring  Linkability of User Reviews
Page 10: Exploring  Linkability of User Reviews

IR: Identified RecordIR

IR

IR

IR

AR

AR

AR

AR

AR: Anonymous Record

Page 11: Exploring  Linkability of User Reviews

Anonymous Record Size (AR)

Identified Record Size (IR)

Matching Model

TOP-X LinkabilityX: 1 and 101, 5, 10, 20,

…60

Page 12: Exploring  Linkability of User Reviews

Dataset

1 Million Reviews 2000 Users more than 300 review

Page 13: Exploring  Linkability of User Reviews

Methodology

Naïve Bayesian Model Kullback-Leibler Model

Symmetric Version

Page 14: Exploring  Linkability of User Reviews

Naïve Bayesian (NB)

Identified Record(IR)

Anonymous Record(AR)

Decreasing Sorted List of IRs

Page 15: Exploring  Linkability of User Reviews

Kullback-Leibler Divergence

(KLD)

Identified Record(IR)

Anonymous Record(AR)

Increasing Sorted List of IRs

Page 16: Exploring  Linkability of User Reviews

Maximum Likelihood Estimation

Page 17: Exploring  Linkability of User Reviews

Tokens

Unigram: ‘a’, ….’z’ Digram: ‘aa’, ‘ab’,…,’zz’ Rating :1,2,3,4,5 Category: restaurant, Beauty and Spa,

Education

Page 18: Exploring  Linkability of User Reviews

Lexical Token Results

Page 19: Exploring  Linkability of User Reviews

NB -Unigram

Size 60, LR 83%/ Top-

1LR 96% Top-

10

Page 20: Exploring  Linkability of User Reviews

KLD - Unigram

Size 60, LR 83%/ Top-

1LR 96% Top-

10

Page 21: Exploring  Linkability of User Reviews

NB Digram

Size 20, LR 97%/ Top-

1Size10,

LR 88%/ Top-1

Page 22: Exploring  Linkability of User Reviews

KLD Digram

Size 60, LR 99%/ Top-

1Size 30,

LR 75%/ Top-1

Page 23: Exploring  Linkability of User Reviews

Improvement (1): Combining Lexical and

non-Lexical ones

Page 24: Exploring  Linkability of User Reviews

Combining in NB model Straightforward P(Rating|IR), P(Category|IR)

But for KLD? Weighted Average

Page 25: Exploring  Linkability of User Reviews

First, Combine Rating and Category

Second, Combine non-lexical and lexical

0.5

0.997/0.97 for Unigram/Digram

Page 26: Exploring  Linkability of User Reviews

Token Combining Results

Page 27: Exploring  Linkability of User Reviews

Rating, Category, and Unigram - NB

Gain, up to 20%Size 30,

60 % To 80%Size 60,

83 % To 96%

Page 28: Exploring  Linkability of User Reviews

Rating, Category, and Unigram - KLD

Gain, up to 12%Size 40,

68 % To 80%Size 60,

83 % To 92%

Page 29: Exploring  Linkability of User Reviews

Rating, Category, and Digram - NB

Page 30: Exploring  Linkability of User Reviews

Rating, Category, and Digram - KLD

Page 31: Exploring  Linkability of User Reviews

What about Restricting Identified Record (IR) Size?

Page 32: Exploring  Linkability of User Reviews

Anonymous Record Size (AR)

Identified Record Size (IR)

Matching Model

TOP-X LinkabilityX: 1 and 10

Page 33: Exploring  Linkability of User Reviews

Anonymous Record Size (AR)

Identified Record Size (IR)

Matching Model

TOP-X LinkabilityX: 1 and 10

Page 34: Exploring  Linkability of User Reviews

Restricted IR - NB

Affected by IR size

Page 35: Exploring  Linkability of User Reviews

Restricted IR - KLD

Performed better for smaller IRSize 20 or less, improved

The rest, comparable

Page 36: Exploring  Linkability of User Reviews

What about Matching All AR’s at once?

Page 37: Exploring  Linkability of User Reviews

Anonymous Record Size (AR)

Identified Record Size (IR)

Matching Model

TOP-X LinkabilityX: 1 and 10

Page 38: Exploring  Linkability of User Reviews

Anonymous Records (AR’s)

Identified Records (IR’s)

Matching Model

Page 39: Exploring  Linkability of User Reviews

Improvement (2): Matching All IR’s At

Once

Page 40: Exploring  Linkability of User Reviews

✖✖

Page 41: Exploring  Linkability of User Reviews

MatchAll - Restricted

Gain, up to 16%

Size 30, From 74% To 90%

Page 42: Exploring  Linkability of User Reviews

Matchall - Full

Gain, up to 23%

Size 20, From 35% To 55%

Page 43: Exploring  Linkability of User Reviews

Improvement (3): For Small IR Size

Page 44: Exploring  Linkability of User Reviews

Changing it to:0.5 + Review

Length

Page 45: Exploring  Linkability of User Reviews

Results – Improvement (3)

Size 10, 89% To 92%Size 7, 79% To 84%

Gain up to 5%

Page 46: Exploring  Linkability of User Reviews

Discussion Implications

Cross-Referencing Review Spam

Non-Prolific Users Gradually becomes prolific IR of 20, Link Around 70%

Anonymous Record Size Linkability high even for small (92% for AR

of 10) 60 only 20% of min user contribution

Page 47: Exploring  Linkability of User Reviews

Discussion (cont.) Unigram Token

Very Comparable for larger AR Entail less resources in the attach 26 VS

676

Page 48: Exploring  Linkability of User Reviews

Future Directions

• Improving more for Small AR’s• Other Probabilistic Models• Using Stylometry

• Exploring Linkability in other Preference Databases

• More than one AR for different Users: Exploring it more

Page 49: Exploring  Linkability of User Reviews

Conclusion

Extensive Study to Assess Linkability of User ReviewsFor large set of usersUsing very simple features

Users are very exposed even with simple features and large number of authors

Page 50: Exploring  Linkability of User Reviews

Thank you all!


Recommended