Department of Computer Science and EngineeringTexas A&M University
CIKM 2019
Parisa Kaghazgaran, Majid Alfifi, James Caverlee
Wide-Ranging Review Manipulation Attacks:
Model, Empirical Study and Countermeasure
User Reviews are Everywhere
Online Retailers
Business Review Forums
… and so are targets of manipulation.
Media Platforms
!2
Amazon Headphones, 2019
!3
Amazon Headphones, 2019
!4
Crowd-based Manipulation Campaigns
Crowdsourcing websites
✓Read the product description before writing down a review.
✓Go to https://goo.gl/7QfW0h✓Leave a relevant 5-star review with at
least 40 words.✓Provide the proof that you left the
review your self.
Target Review Platforms
!5
Realistic Reviews: Written by Humans
• Reviews arrive synchronized in time.e.g., Mukherjee et al. 2013, Akoglu et al. 2015, Kaghazgaran et al. 2019
• Dense community over co-review graph.e.g., Kijung et al. Shin 2017, Kaghazgaran et al. 2018
Typical detection approaches:
Manipulators can launch scalable and difficult to detect attacks by removing manipulation traces.
!6
Crowd vs. Machine
!7
Review PlatformsCrowdsourcing
websites
Crowdworkers
Attack Strengths• Scalability: do not rely on paying workers.• Increased deception: obfuscate signals left by crowd campaigns.
!8
How machine produces human readable reviews?
Seminal work Yao, Yuanshun, et al. "Automated crowdturfing attacks and defenses in online review systems." CCS, ACM, 2017.
• Domain dependent reviews.e.i., training process needs to be replicated for any domain, needs large training data for each domain
• Character-level language model.e.i., capture longer dependency, learn spelling in addition to semantics, grammatically more error-prone
Leveraging neural language models for the specific domain of restaurant reviews at Yelp
Our Proposed DIOR Framework
1.WeproposeDIORforDomainIndependentOnlineReviewGenera9on.
2.Empiricalstudytoevaluatethequalityofmachine-generatedreviews.
3.Embedding-basedclassifiertodetectsuchfakereviews.
!9
Goal:validatethewide-ranginga2acksonreviewpla7ormsandproposetodetectthem.
!10
Neural Language Models (A quick refresher)
• Recurrent Neural Networks have shown success in generating meaningful text.
• They learn from a sequence of words to predict the next word
Ht
ot
xt
Each word in the review as input
Learn the information from sequence until time-step t
Predict the next word in the review
x( < = t)(P(xt+1 |x1, …, xt)) H0
o0
x0
H1
o1
x1
H2
o2
x2
H3
o3
x3
I
ate
ate
at this restaurant
At thisIn generation step, the predicted word at time-step t is feed back to the model as input along with hidden state to predict the next word
!11
RQ1: Can we generate reviews across different domains?
I ’ve eaten here about 8 times . I ’ve been introduced to this place . Its always busy and their food is consistently great . I LOVE their food , hence the name . It is so clean , the staff is so friendly , and the food is great . I especially like the chicken pad thai , volcano roll , and the yellow curry .
the case works great ! it has a soft rubber insert that goes over the hard shell . The hard plastic shell has a soft inner shell and the hard case is hard plastic . It is very sticky and has not fallen out or dropped or fallen apart .
this app is a great tool for discovering new things : being able to search for films and putting reviews on particular items as well as having a way to download stories from the app .
!12
Yelp
LSTM3
wt
Encoder
word to id
Input word
id
LSTM2
Decoder
id to word
wt+1
Next word
LSTM1id
Embedding of input word (400)
ht1(1150)
ht2(1150)
ht3(400)Embedding of output word
Transfer Learning to the Rescue!
Amazon
Universal Model
Transferred Model
App Store
Universal model parameters —>
Transferred model parameters —>
θYelp
θAmazon
θAppStore
!13
Example of Synthetic Reviews
I ’ve eaten here about 8 times . I ’ve been introduced to this place . Its always busy and their food is consistently great . I LOVE their food , hence the name . It is so clean , the staff is so friendly , and the food is great . I especially like the chicken pad thai , volcano roll , and the yellow curry .
this is a nice case . It ’s a little difficult to remove , but that ’s to be expected . The case is slightly thicker than a regular screen protector , but that is to be expected . It ’s a great phone case and I highly recommend it .
this app is great for learning the basics of math ! I love that it has a different function that can help you learn the words that you understand . I wish all apps were this simple .
Temperature
Labe
led
“Rea
l” (%
)
020406080
0.2 0.4 0.6 0.8 1.0
Labe
led
“Rea
l” (%
)
022.5
4567.5
90
0.2 0.4 0.6 0.8 1.0
Labe
led
“Rea
l” (%
)
020406080
0.2 0.4 0.6 0.8 1.0
Yelp
Amazon
App Store
!14
RQ2: Can Model-generated Reviews Pass Human Test?
Takeaway 1: Reviews generated at temperature 0.8 can fool human readers and go undetected.Takeaway 2: Human readers are more sensitive to repetition errors than they are to small grammar mistakes.
AMT Guidelines
• 95% approval rate
• Dwell time >= 7 minutes
• Located in US
• Ask for a trivial question
!15
RQ3: Can Spam Detector Catch Model-generated Reviews?
Takeaway: The textual-based spam detector does not distinguish synthetic reviews from real reviews.
Yelp Amazon App Store
Accuracy (%) 64 61 62
Precision (%) 65 64 62
Recall (%) 65 61 62
F1 score (%) 65 60 62
Textual Features
• Similarity
• Structural
• Syntactic
• Semantic
!16
RQ4: How DIOR Works versus Crowd Manipulators?
Takeaway: Users find reviews generated by DIOR as reliable as fake reviews written by manipulation campaigns.
DIOR31%
Neither37%
Crowd32%
!17
RQ5: How DIOR works versus Individual Models?
Takeaway: Using transfer learning not only facilitate the domain shift but also improves the performance significantly.
Pref
eren
ce (%
)
0
25
50
75
100
0
25
50
75
100Amazon App Store
TransferredModel Transferred
Model
IndividualModel
IndividualModelBoth Both
!18
RQ6: How Much Training Data for Transferred Model?
Takeaway: The transferred models need reasonably low number of samples compared to universal model to reach stable performance.
Valid
atio
n Lo
ss
3
3.2
3.4
3.6
25k 50k 75k 100k 125k 150k 200k
App Store Amazon
Training Size
!19
RQ7: How We Can Detect Model-generated Reviews?
Takeaway: Model-generated reviews are detectable in the embedding space with high accuracy.
• Embedding based Classifier
Conclusion and Next Steps
!20
Explored how transferred learning technology could lead to a wide-ranging review manipulation attacks.
Proposed DIOR framework demonstrates:
(1) Model-generated reviews can be perceived as real by human examiners, pass the traditional textual-based spam detectors, and beat the crowd-based review manipulators.
(2) Fake reviews tend to cluster together in the embedding space that provide the intuition for our proposed discriminator.
Next steps: study the performance of other neural network architectures to develop more powerful discriminator.