TOmCAT: Target-Oriented Crowd Review Attacks and...

Department of Computer Science and EngineeringTexas A&M University

ICWSM 2019

Parisa Kaghazgaran, Majid Alfifi and James Caverlee

TOmCAT: Target-Oriented Crowd Review Attacks and

Countermeasures

User Reviews are Everywhere

Online Retailers

Business Review Forums Government Agencies

… and so are targets of manipulation.

Media Platforms

What Happened? 2017

Amazon Deletes Negative Clinton Book Reviews

Before Amazon Changed Rating

After Amazon Changed Rating

FCC Comments on Net Neutrality, 2018

"In the matter of restoring Internet freedom. I'd like to recommend the commission to undo The Obama/Wheeler power grab to control Internet access. Americans, as opposed to Washington bureaucrats, deserve to enjoy the services they desire. The Obama/Wheeler power grab to control Internet access is a distortion of the open Internet. It ended a hands-off policy that worked exceptionally successfully for many years with bipartisan support.”

"Chairman Poi: With respect to Title 2 and net neutrality. I want to encourage the FCC to rescind Barack Obama's scheme to take over Internet access. Individual citizens, as opposed to Washington bureaucrats, should be able to select whichever services they desire. Barack Obama's scheme to take over Internet access is a corruption of net neutrality. It ended a free-market approach that performed remarkably smoothly for many years with bipartisan consensus.”

"FCC: My comments re: net neutrality regulations. I want to suggest the commission to overturn Obama's plan to take over the Internet. People like me, os opposed to so-called experts, should be free to buy whatever products they choose. Obama's plan to take over the Internet is a corruption o int neutrality. It broke a pro-consumer system that performed fabulously successfully for two decades with Republican and Democrat support."

"Mr Poi: I'm very worried about restoring Internet freedom. I'd like to ask the FCC to overturn The Obama/Wheeler policy to regulate the Internet. Citizens, rather than the FCC, deserves to use whichever services we prefer. The Obama/Wheeler policy to regulate the Internet is a perversion of the open Internet. It disrupted a market-based approach that functioned very, very smoothly for decades with Republican and Democrat consensus."

"FCC: In reference to net neutrality. I would like to suggest Chairman Poi to reverse Obama's scheme to control the web. Citizens, as opposed to Washington bureaucrats, should be empowered to buy whatever products they prefer. Obama's scheme to control the web is a betrayal of the open Internet. It undid a hands-off approach that functioned very, very successfully for decades with broad."https://hackernoon.com/more-than-a-million-pro-repeal-net-neutrality-comments-were-likely-faked

Majority of the “real” comments were in favor of keeping net neutrality.

https://hackernoon.com/more-than-a-million-pro-repeal-net-neutrality-comments-were-likely-faked

Amazon Headphones, 2019

Amazon Headphones, 2019

This Paper: Focus on Targets of Manipulation

• Target = Product, Service, Issue, …• e.g., Clinton’s What Happened, Net Neutrality on

FCC.gov, headphones on Amazon• Many previous efforts focus directly on detecting

review writers or reviews directly

This Paper: Focus on Targets of Manipulation

• If we know what products are targeted, then we can:

• Deploy more resources to defend the target from ongoing threats• e.g., require additional user verification or

enlist more human moderators

• Develop more robust defensive countermeasures for future threats• e.g., by learning patterns of the types of

issues targeted

Key Challenge: How Can we Find a Ground Truth of Targeted Products?

uncertainty over capturing actual manipulation behaviors

focuses on highly-visible fake behaviors

This paper focuses on crowdsourced review manipulation to build a ground truth of targeted products.

• Manual labeling of fake reviewse.g., Mukherjee et al. 2013, Akoglu et al. 2015

• Simulation of bad behavior e.g., Ott et al. 2013

• Ex-post analysis of outlierse.g., Liu 2011 et al., Akoglu et al. 2013, 2015

Prior efforts mainly focus on review writers or reviews

Crowdsourced Review Manipulation

Crowdsourcing websites

✓Read the product description before writing down a review.

✓Go to https://goo.gl/7QfW0h✓Leave a relevant 5-star review with at

least 40 words.✓Provide the proof that you left the

review your self.

Target Review Platforms

https://goo.gl/7QfW0h

. . . . .

.

.

.. . .

.

.

.

. . .. . .

User Profile

This Paper: RapidWorkers ➜ Amazon

4. We show that TOmCAT outperforms six unsupervisedand supervised baselines that originally were proposedto detect manipulation at the user/reviewer level. We findthat review manipulation behaves differently at user andproduct levels. (Section 4.4)

5. We validate TOmCAT over three additional domains –Yelp, the App Store, and an alternative Amazon dataset– where we find that TOmCAT can effectively detect ma-nipulation patterns in other domains. (Section 4.5)

6. Although TOmCAT can uncover target products with highaccuracy by addressing existing attacks, strategic attack-ers can potentially create hard-to-detect behavioral pat-terns by undermining timing-based footprints. Inspired byrecent advances in recurrent neural networks, we furtherpropose a complementary approach to TOmCAT calledTOmCATSeq.1 We believe this is the first work to lever-age RNN models on rating sequences for review manip-ulation detection. The initial evaluation of TOmCATSeqshows promising results against strategic manipulationand opens new doors for future study. (Section 5)Note that we restrict our approach from taking advantage

of historical reviewer behavior, reviews, and network prop-erties, to emulate a scenario in which a powerful attackercan nullify the discriminatory power of these signals. Forexample, reviewers may have little or no history, say by us-ing multiple user accounts to write fake reviews, degradingthe impact of network characteristics on uncovering manip-ulation. Also, fake reviews may also imitate real (non-fake)reviews, limiting the power of linguistic detection. And asadvances in AI continue, machine generated fake reviewswill increasingly be difficult to detect (Yao et al. 2017).

2 Related WorkMany methods have been proposed to identify individualfake reviews. These methods often focus on the content ofreviews themselves, from the perspective of bag of wordsfeatures (Ott, Cardie, and Hancock 2013; Jindal and Liu2008; Sandulescu and Ester 2015) and structural features (Liet al. 2011; Piskorski, Sydow, and Weiss 2008; Li et al. 2014;Harris 2012) like length of reviews, part-of-speech-tagging,and proportion of certain pronouns.

Another direction aims to identify users who engage inmanipulation tasks. This has been extensively studied by re-searchers mainly by leveraging behavioral signals such asskewed rating distributions (Hooi et al. 2016; Shah et al.2016) and dense inter-arrival times between successive re-views (Shah et al. 2016; Hooi et al. 2016; Ye, Kumar, andAkoglu 2016).

In a related direction, some efforts try to find groupsof fraudulent reviewers, often by finding synchronized be-haviors. One approach is graph-based, where users arenodes and their relationships are edges (Jiang et al. 2014;Prakash et al. 2010; Kumar et al. 2018; Akoglu, Chandy,and Faloutsos 2013). An embedding-based method is pro-posed in (Kaghazgaran, Caverlee, and Squicciarini 2018) toidentify fraudulent users who are distant in a local graph but

1The naming is inspired by sequencing nature of RNN models

Table 1: Summary of Our Dataset

# Products # ReviewsPrimary Targets 533 33 k

Secondary Targets 3,467 2.6 MRandomly Sampled 4,000 0.7 M

close in an embedding space. A separate direction detectsdense blocks in a ratings matrix (Shin et al. 2017), to findclusters of coordinating raters.

Finally, crowd attacks have been explored in Facebook(Cao et al. 2014) with a SynchroTrap system to deal with co-ordinated Likes on selected pages and in Twitter (Viswanathet al. 2015) with the Stamper method to detect targeted hash-tags. These recent trends in similar domains for detectingmalicious crowds motivate us to tackle this problem in on-line review platforms wherein a crowd seeks to compromisethe rating of a product or service.

3 Target-Oriented Crowd AttacksWe say that a coordinated attack that aims to manipulate aspecific target is a target-oriented crowd attack. Such coor-dinated attacks have historically been difficult to identify inorder to build a solid ground truth and evaluate correspond-ing countermeasures. How can we be sure that a product hasactually been targeted?

3.1 Building Ground TruthHere, we adopt an approach that samples evidence of ma-nipulation launched from crowdsourcing platforms wherea paymaster has tasked crowd workers to write a fewfake reviews on a specific product on Amazon (Kaghaz-garan, Caverlee, and Alfifi 2017). We refer to such prod-ucts as target products. Similar efforts (Wang et al. 2012;Song, Lee, and Kim 2015; Lee, Tamilarasan, and Caver-lee 2013) have been used to uncover other crowd-based at-tacks. In particular, we monitored the crowdsourcing plat-form RapidWorkers beginning in July 2016 for evidence ofAmazon-related attacks. In contrast to our previous workthat focuses on fake review writers and study their behaviors(Kaghazgaran, Caverlee, and Squicciarini 2018), this workprovides a new dataset of products that have been targetedfor attack. The previous work is based on a seed set of 300products to label fake review writers. Here, we extend theinitial seed set and develop an expansion methodology togain a rich set of target products.

Primary Target Products. In total, we were able to identify900 tasks targeting 533 unique products. We call these initialproducts the primary target products. By linking those prod-ucts to Amazon, we then collected their associated reviews– that is ~33k reviews. Next, we explored the activity his-tory of ~14k reviewers and crawled their reviews, i.e. ~580kreviews.

Identifying Suspicious Reviewers. Not all reviewers on atarget product are suspicious. To identify suspicious review-ers we use a similar methodology to that introduced in (Jin-dal and Liu 2008) in which users with a certain fraction

Restoration Attack Promotion Attack

• In response to a low-rate review, a series of fake reviews to restore the overall rating.

Examples of Crowd Attacks

• Targets a newly released product with fake reviews to promote its rating.

* DON’T DO IT, February 15, 2018 This is an inferior product. They are miniature, at least half the size they should be. I don't know anyone with eyes this small….***** Very Very good Product, February 16, 2018 Very Nice product. Love it. Looking very cute. Best deal at best price. Very must satisfied with product and service.***** They really last for a good …, February 17, 2018 These lashes are so easy to use and they really last for a good while. There’s a learning curve involved to apply them, ……***** Eyelashes look very natural, February 17, 2018 I bought it for my girlfriend and she loves it. its a little difficult to learn how is the right way to put it on, but when you got …..***** Looks extremely real!, February 17, 2018 I bought this for my girlfriend and I told her to test whether i could tell if they are her real eyelashes or fake one, but ……

***** best shower head I have gotten, August 30, 2017 This multi-functional head helps us relax ourselves during the bath and gives the feel of taking a spa. I am using it everyday….***** Best product….., August 30, 2017 The design of the product is amazing and I absolutely love the water flow combinations. The water flow settings can be ….***** very high quality product, August 30, 2017 The water flow settings can be changed easily and a very high quality product. It has a setting for a very full vigorous….***** Multifunctions shower ….. All in one, August 31, 2017 Multifunctions shower is very useful because Three Settings Water Flow Control For Your Pleasure. In this product ….* Don’t waste your Money, March 10, 2018 Very cheap quality. After a few days, it became loose and detached from the hose. The plumbing tape even didn’t help….

Timing and Sequence of Ratings is Important

4 5 6 7 8 9 10

20%

40%

60%

80%

100%Random ProductsTarget Products

Number of Reviews

Perc

enta

geof

Prod

ucts

window size(days)

75375

3

Figure 3: Target products tend to receive higher number ofreviews in short time intervals.

Initial Observations: Given crowd attacks characteristicsand types, we analyze the rating behavior of target productsversus randomly selected products from three dimensions.Products are described by their time-series ratings. More for-mally, for each product p, we have an ordered sequence ofratings as Rp = (r1, .., rn) where ri happens earlier thanri+1.

1. Dense Review Behavior: First, we investigate how manyreviews may turn up during a specific window of time wunder the two classes of products. By sliding w over a se-quence of reviews, we measure the number of reviews thatare written in this interval. Referring to Figure 1 which givesan example of restoration attack, fake reviews were writtenwithin a time window of 3 days. Therefore, we set the valueof w to be 3, 5 and 7 days. Also, since we observe a cam-paign size n is in the range of 5 to 10 reviews, we examinewhat portion of products receive 5 to 10 reviews within wdays. Figure 3 summarizes the review behavior across dif-ferent values of w and n. Interestingly, a large portion of tar-get products demonstrate dense review behavior. For exam-ple, 78% of target products have received 7 reviews within5 days while only 40% of random products display such be-havior.

2. Low High Rate Behavior: The purpose of restoration at-tacks is to rebuild the trust in a product that has been shakenby a negative review. Thus, we investigate low-high rateevents, e.g., a 5-star review showing up immediately aftera 1/2-star review. In total, we found 111, 870 and 29, 205number of such events in target and random products respec-tively. That is, target products are almost four times morevulnerable to this event. However, the existence of this kindof event is not a strong indicator of anomalous behavior asit could happen naturally due to consumers with differenttastes evaluating a single product significantly differently(Hu, Pavlou, and Zhang 2006).

To control for this, we measure how fast different productsreact to a negative review by gauging the inter-arrival timebetween sequential low and high ratings. Figure 4 shows

0 1 2 3 4 5 6 7

20%

40%

60%

80%

100%

Random ProductsTarget Products

Time Interval (days)

Perc

enta

geof

low

-hig

hEv

ents

Figure 4: Target products react to low-rate review faster

5 6 7 8 9 10

5k

10k

15k

20k

25k

30k

Target ProductsRandom Products

Number of Sequential High-rated Reviews

Num

bero

fEve

nts

Figure 5: Target products tend to have more series of high-rating reviews immediately following a low-rating review.

what portion of these events happens in less than a specifictime window w. We set w to be between 0 and 7 days. In-terestingly, 71% of events occur in the same day as the neg-ative review is written in target products while in randomproducts, only 13% of such events happen in the same day.

Rough Estimation of Attack Types: 3,258 products experi-ence low-high rate events. Considering same day occurrenceof such event as baseline, we can say 2,660 out of ,3467 tar-get products are targeted by restoration attacks. Similarly,211 of the target products do not experience low-high rateevents where we can say they are only targeted by promo-tion attacks.

3. Sequential High Ratings Behavior: In this analysis, wecount the number of 5-star reviews immediately followinga low-rated review. We set the size of these blocks of 5-starreviews to be between 5 and 10 with respect to the crowdcampaign size. We can see from Figure 5 that the existenceof such blocks in target products is about 5 times more likelythan among their randomly selected peers.

Proposed Framework:Target Oriented Crowd Attack (TOmCAT)

• Modeling review behavior at product level based only on timing and ratings.

• Micro features codify the individual rating behaviors of the products.

• Macro Features measure the deviation from normal-behavior.

p1

p2

pm

R1r,t R2r,t Rn’r,t

Rating Matrix Feature Extraction

f1 fk

Neural-based Classifier

f2f1

f2

fk

?

Attack Footprints: Micro Features

SLH(p) = avg(IAT(ri, ri+1) |ri ∈ {1,2} ∧ ri+1 ∈ {5})

• Speed of Low-High Rate Events (SLH) • Inter-arrival times (IATs) between sequential low and high rate reviews

SHR(p) = avg(k |ri ∈ {1,2} ∧ ri+1, . . , ri+k ∈ {5})

• Sequential High Ratings (SHR) • Number of sequential 5-star reviews following a low star review.

RHR(p) =SHR(p)

IAT(ri, ri+k)

• Ratio of High Rating Reviews (RHR) • Fake reviews are generated rapidly

VIT(p) = STD(median(IATs), max(IATs))

• Variance of Inter-arrival Times (VIT) • Target products receive fake reviews rapidly and then reach an equilibrium state

Attack Footprints: Macro Features

KL(P, M) =n

∑i=1

Pi × log2Pi

Mi

Amazon dataset with 1.5M items and 83M reviews is used to model the normal behavior.e.g., the average rating distribution forms the base rating distribution.

KL-divergence between normal behavior (M) and a given product (P) attribute distribution.

Anomaly score is dominated by large clusters. Using the notion of Inverse Distance Weighting to reinforce the impact of distance.

σ(p) =K

∑k=1

ρk × KL(P, Mk)Do all products follow a similar distribution?Using the notion of clustering to model the normal distribution into k clusters.

Anomalyscore

Cluster Size

Experiments: Ablation Study

Target Products Random Products

Recall Precision F1 Recall Precision F1 Accuracy

Macro only 83 81 82 81 83 82 82

Micro only 69 83 75 86 74 79 77

Macro + Micro 86 83 84 82 85 84 84

• Do macro features work well? Or micro features?

• Macro features are strongest in isolation• But Macro + Micro results in the best performance.

Experiments: Supervised BaselinesAc

cura

cy

0

20

40

60

80

100

spamBehvr Spamicity ICWSM'13 TOmCAT

• TOmCAT outperforms alternatives with 81% accuracy• Alternatives are all originally designed for detecting anomalous

reviewers ==> in our scenario, reviewers show a mix of legitimate and suspicious behaviors

Experiments: Unsupervised Baselines P

reci

sion

@k

50

60

70

80

90

100

@k100 500 1000 1500 2000 2500 3000

TOmCAT BIRDNEST edgeCentric Helpfulness Vote

• For small k, edgeCentric is slightly better than TOmCAT• As k increases, TOmCAT does the best job of identifying targeted products.

Can TOmCAT (trained on one Amazon dataset) work on other domains?

Prec

isio

n @

100

50

60

70

80

90

100

Amazon (Alt) App Store Yelp!20

• Recall: we don’t have ground truth of other domains (at the target level)

• So we consider proxies for whether the product/app/business has been targeted

Is TOmCAT Resistant against Strategic Attackers?

0

0.1

0.2

0.3

0.4

1 3 5 7 9 11 130

0.1

0.2

0.3

0.4

1 3 5 7 9 11 13

• Example: Inter-arrival time distribution significantly differs from normal behavior in target products.

Target Products Random Products

TOmCAT 86 83 84 82 85 84 84

w/o Temporal ftrs 79 69 70 69 70 70 69

From TOmCAT to TOmCATSeq

Advantages • Learns only from sequence of ratings w/o temporal features. • No need for hand-crafted features.

p1

p2

pm

R1r R2r Rn’r

Rating Matrix

…

r0 r1 r2 rt

LSTM LSTM LSTM LSTM

LSTM LSTM LSTM

ConcatenatSigmoid0/1

0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0

FC layer

LSTM

RirRir,t

The Impact of Strategic AttackersAc

cura

cy %

0

20

40

60

80

100

7969

84TO

mC

AT

Atta

cked

TO

mC

AT

TOm

CAT

Seq

• TOmCATSeq can perceive hidden patterns left by crowd attacks in the absence of explicit footprints.

w/o temporal features

Conclusion and Next Steps

• We propose to focus on targets of attack (e.g., products, services) rather than individual reviews or reviewers

• Sample a large collection of targeted products on Amazon

• Model product review behavior through a set of micro and macro features inspired by the presence of restoration and promotion attacks;

• Show how to exploit hidden patterns of manipulation attacks to defeat strategic attackers.

• In our ongoing work, we are:• Expanding our coverage of crowd campaigns• Exploring how linguistic features can complement our

focus on ratings and temporal behaviors

Date post:	25-Jun-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

TOmCAT: Target-Oriented Crowd Review Attacks and...

Documents