Jialong Zhang, Chao Yang, Zhaoyan Xu , Guofei Gu SUCCESS Lab, Texas A&M University

PoisonAmplifier: A Guided Approach of Discovering Compromised Websites through Reversing Search Poisoning

AttacksJialong Zhang, Chao Yang, Zhaoyan Xu, Guofei

GuSUCCESS Lab, Texas A&M University

Published in RAID 2012

A.C. Chen @ ADL 2

Outline Introduction

– SEO– Search Poisoning Attacks

System Design Evaluation Result Conclusion

A.C. Chen @ ADL 3

Search Engine Optimization (SEO) White hat SEO

– a legitimate means of making websites appear on top of search results pages

Black hat SEO– the malicious way of using SEO– widely used by attackers to make their

spam/malicious websites come up in top search results of popular search engines

A.C. Chen @ ADL 4

Search Poisoning Attacks Mislead victims to malicious websites by

taking advantages of users’ trust on search results– If the requests are referred from specific search

engines, the malicious content will show– If the requests are directly from users, the

compromised websites will return normal content

A.C. Chen @ ADL

Search Poisoning Attacks Workflow

5

A.C. Chen @ ADL

Malicious Contents

6

User View Searcher View

A.C. Chen @ ADL 7

In this Paper… Given a small seed set, to identify

(amplify) more websites compromised by the search poisoning attacks– Unlike most existing studies that try to

understand or detect search poisoning attacks

– Ex: SURF: Detecting and Measuring Search Poisoning (CCS 2011)

A.C. Chen @ ADL 8

Main Idea Attackers tend to use a similar set of

keywords in multiple compromised websites

Attackers tend to insert links in Bot View to promote other compromised websites

Attackers tend to compromise multiple websites by exploiting similar vulnerabilities

A.C. Chen @ ADL

System Architecture of PoisonAmplifier

9

Initial Terms

Seed Compromised

Websites

Seed Collector

PromotedContent

Extractor

PromotedContent

TermAmplifier

VulnerabilityAmplifier

LinkAmplifier

extracts and analyzes the content in User View but not in Bot View

to find more compromised websitesCompares the User

View and Searcher View

A.C. Chen @ ADL 10

Seed Collector

Send HTTP requests with customized values in HTTP header – Searcher View: pretend to visit from

Search Engine by using customized Http Referrer

– User View: customized User-Agent

A.C. Chen @ ADL 11

Promoted Content Extractor Extracts HTML content that appears in the

Bot View but not in the User View– filter web content that is used for displaying

such as HTML Tags, CSS codes…

GoogleBot User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Chrome User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11

A.C. Chen @ ADL 12

Term Amplifier Extract effective query terms

– so that we can obtain as many compromised websites as possible via searching those termstokenizes the promoted content into {Pi|i = 1,2,…,N}

search eachphrase Pi on the search engine

If the number of search results is lower than a threshold TD=1000000

comparing their Searcher Views and User Views

A.C. Chen @ ADL 13

Link Amplifier (1/3) Extracts inner-links and outer-links

– inner-links refer to those links/URLs in the promoted web content

– outer-links refer to those links/URLs in the web content of 3rd-party websites

– Ref.

http://kerash.comli.com/po/Olink.php

A.C. Chen @ ADL 14

Link Amplifier (2/3) For each inner-link and outer-link,

Link Amplifier considers the linking website as compromised website if the Searcher View and User View are different– Utilize Google dork to locate the outer-

links– Ex: intext:seed.com intext:seedTerm

A.C. Chen @ ADL 15

Link Amplifier (3/3) We can find more categories of compromised

websites through analyzing those outer-links

A.C. Chen @ ADL 16

Vulnerability Amplifier Analyze possible system/software

vulnerabilities of those compromised websites (collected from Term/Link Amplifier)

In our preliminary work, we only focus on analyzing the vulnerabilities of WordPress– still requires some manual work to extract

search signatures– Vulnerability Amplifier examines whether it is

compromised or not by comparing its Searcher View and User View

A.C. Chen @ ADL 17

Evaluation of PoisonAmplifier Stage I (1 week)

– dataset: seed compromised websites– evaluate effectiveness, efficiency, and

diversity Stage II (1 month)

– dataset: the amplified terms and compromised websites from Stage I

– evaluate constancy

A.C. Chen @ ADL 18

Stage I: Dataset--- Seed Terms Google Trends

– totally 103 unique Google Trends topics Twitter Trends

– totally 64 unique Twitter Trends topics Customized keywords

– specific to scam words of pharmacy words– totally 165 unique pharmacy words from

existing work, manually selection and Google Suggest API

Totally 332 unique seed terms

A.C. Chen @ ADL 19

Stage I: Dataset ---Seed Compromised Websites

Google each unique seed term and collected the top 200 search results

Examine the Searcher View and User View of each search result

Totally 252 unique seed compromised websites were found

A.C. Chen @ ADL 20

Effectiveness How many new compromised websites can

be found Amplifying Rate(AR) of compromised sites

– #new found / seed

starting from only 252 seed compromised

websites, totally around 75,000 unique

compromised websites were discovered

A.C. Chen @ ADL 21

Efficiency Whether the websites visited by

PoisonAmplifier are more likely to be compromised websites

Hit Rate (HR) of the PoisonAmplifier – #new found / total # of websites it visited

A.C. Chen @ ADL 22

Diversity How many compromised websites of each

component are exclusive, which can not be found by other components

Exclusive ratio (ER)– #of compromised websites that are only

found by this component / total # of compromised websites found by this component.Component ER

Term Amplifier 99.56%

Inner-link Amplifier 96.11%

Outer-link Amplifier 89.09%

Vulnerability Amplifier 88.77%

A.C. Chen @ ADL 23

Constancy Whether PoisonAmplifier can continue to

find new compromised websites over time

Link Amplifier and Vulnerability

Amplifier can keep finding new

terms and compromised websites

The daily newly found

compromised websites

decrease quickly due to

the exhaustion of terms

A.C. Chen @ ADL 24

Distribution based on TLD

A.C. Chen @ ADL 25

Conclusion Starting from a small seed set of

known compromised websites, PoisonAmplifier can recursively find more compromised websites by analyzing poisoned webpages’ special terms, links, and exploring compromised web sites’ vulnerabilities

Date post:	22-Feb-2016
Category:	Documents
Upload:	faolan
View:	54 times
Download:	0 times

Jialong Zhang, Chao Yang, Zhaoyan Xu , Guofei Gu SUCCESS Lab, Texas A&M University

Documents