PoisonAmplifier: A Guided Approach of Discovering Compromised Websites through Reversing Search Poisoning
AttacksJialong Zhang, Chao Yang, Zhaoyan Xu, Guofei
GuSUCCESS Lab, Texas A&M University
Published in RAID 2012
A.C. Chen @ ADL 2
Outline Introduction
– SEO– Search Poisoning Attacks
System Design Evaluation Result Conclusion
A.C. Chen @ ADL 3
Search Engine Optimization (SEO) White hat SEO
– a legitimate means of making websites appear on top of search results pages
Black hat SEO– the malicious way of using SEO– widely used by attackers to make their
spam/malicious websites come up in top search results of popular search engines
A.C. Chen @ ADL 4
Search Poisoning Attacks Mislead victims to malicious websites by
taking advantages of users’ trust on search results– If the requests are referred from specific search
engines, the malicious content will show– If the requests are directly from users, the
compromised websites will return normal content
A.C. Chen @ ADL
Search Poisoning Attacks Workflow
5
A.C. Chen @ ADL
Malicious Contents
6
User View Searcher View
A.C. Chen @ ADL 7
In this Paper… Given a small seed set, to identify
(amplify) more websites compromised by the search poisoning attacks– Unlike most existing studies that try to
understand or detect search poisoning attacks
– Ex: SURF: Detecting and Measuring Search Poisoning (CCS 2011)
A.C. Chen @ ADL 8
Main Idea Attackers tend to use a similar set of
keywords in multiple compromised websites
Attackers tend to insert links in Bot View to promote other compromised websites
Attackers tend to compromise multiple websites by exploiting similar vulnerabilities
A.C. Chen @ ADL
System Architecture of PoisonAmplifier
9
Initial Terms
Seed Compromised
Websites
Seed Collector
PromotedContent
Extractor
PromotedContent
TermAmplifier
VulnerabilityAmplifier
LinkAmplifier
extracts and analyzes the content in User View but not in Bot View
to find more compromised websitesCompares the User
View and Searcher View
A.C. Chen @ ADL 10
Seed Collector
Send HTTP requests with customized values in HTTP header – Searcher View: pretend to visit from
Search Engine by using customized Http Referrer
– User View: customized User-Agent
A.C. Chen @ ADL 11
Promoted Content Extractor Extracts HTML content that appears in the
Bot View but not in the User View– filter web content that is used for displaying
such as HTML Tags, CSS codes…
GoogleBot User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Chrome User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11
A.C. Chen @ ADL 12
Term Amplifier Extract effective query terms
– so that we can obtain as many compromised websites as possible via searching those termstokenizes the promoted content into {Pi|i = 1,2,…,N}
search eachphrase Pi on the search engine
If the number of search results is lower than a threshold TD=1000000
comparing their Searcher Views and User Views
A.C. Chen @ ADL 13
Link Amplifier (1/3) Extracts inner-links and outer-links
– inner-links refer to those links/URLs in the promoted web content
– outer-links refer to those links/URLs in the web content of 3rd-party websites
– Ref.
A.C. Chen @ ADL 14
Link Amplifier (2/3) For each inner-link and outer-link,
Link Amplifier considers the linking website as compromised website if the Searcher View and User View are different– Utilize Google dork to locate the outer-
links– Ex: intext:seed.com intext:seedTerm
A.C. Chen @ ADL 15
Link Amplifier (3/3) We can find more categories of compromised
websites through analyzing those outer-links
A.C. Chen @ ADL 16
Vulnerability Amplifier Analyze possible system/software
vulnerabilities of those compromised websites (collected from Term/Link Amplifier)
In our preliminary work, we only focus on analyzing the vulnerabilities of WordPress– still requires some manual work to extract
search signatures– Vulnerability Amplifier examines whether it is
compromised or not by comparing its Searcher View and User View
A.C. Chen @ ADL 17
Evaluation of PoisonAmplifier Stage I (1 week)
– dataset: seed compromised websites– evaluate effectiveness, efficiency, and
diversity Stage II (1 month)
– dataset: the amplified terms and compromised websites from Stage I
– evaluate constancy
A.C. Chen @ ADL 18
Stage I: Dataset--- Seed Terms Google Trends
– totally 103 unique Google Trends topics Twitter Trends
– totally 64 unique Twitter Trends topics Customized keywords
– specific to scam words of pharmacy words– totally 165 unique pharmacy words from
existing work, manually selection and Google Suggest API
Totally 332 unique seed terms
A.C. Chen @ ADL 19
Stage I: Dataset ---Seed Compromised Websites
Google each unique seed term and collected the top 200 search results
Examine the Searcher View and User View of each search result
Totally 252 unique seed compromised websites were found
A.C. Chen @ ADL 20
Effectiveness How many new compromised websites can
be found Amplifying Rate(AR) of compromised sites
– #new found / seed
starting from only 252 seed compromised
websites, totally around 75,000 unique
compromised websites were discovered
A.C. Chen @ ADL 21
Efficiency Whether the websites visited by
PoisonAmplifier are more likely to be compromised websites
Hit Rate (HR) of the PoisonAmplifier – #new found / total # of websites it visited
A.C. Chen @ ADL 22
Diversity How many compromised websites of each
component are exclusive, which can not be found by other components
Exclusive ratio (ER)– #of compromised websites that are only
found by this component / total # of compromised websites found by this component.Component ER
Term Amplifier 99.56%
Inner-link Amplifier 96.11%
Outer-link Amplifier 89.09%
Vulnerability Amplifier 88.77%
A.C. Chen @ ADL 23
Constancy Whether PoisonAmplifier can continue to
find new compromised websites over time
Link Amplifier and Vulnerability
Amplifier can keep finding new
terms and compromised websites
The daily newly found
compromised websites
decrease quickly due to
the exhaustion of terms
A.C. Chen @ ADL 24
Distribution based on TLD
A.C. Chen @ ADL 25
Conclusion Starting from a small seed set of
known compromised websites, PoisonAmplifier can recursively find more compromised websites by analyzing poisoned webpages’ special terms, links, and exploring compromised web sites’ vulnerabilities