+ All Categories
Home > Documents > BLProM:ABlack-BoxApproachforDetectingBusiness-Layer...

BLProM:ABlack-BoxApproachforDetectingBusiness-Layer...

Date post: 24-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
16
Journal of Computing and Security July 2019, Volume 6, Number 2 (pp. 65–80) http://www.jcomsec.org BLProM: A Black-Box Approach for Detecting Business-Layer Processes in the Web Applications Mitra Alidoosti a , Alireza Nowroozi b,* Ahmad Nickabadi c a Malek-Ashtar University of technology, Tehran, Iran. b Malek-Ashtar University of technology, Tehran, Iran. c Amirkabir University of Tehran, Tehran, Iran. ARTICLE I N F O. Article history: Received: 24 May 2019 Revised: 22 December 2019 Accepted: 9 February 2020 Published Online: 3 May 2020 Keywords: Business Layer, Business Process, Navigation Graph. ABSTRACT Web application vulnerability scanners cannot detect business logic vulnerabilities (vulnerabilities related to logic) because they are not able to understand the business logic of the web application. To identify the business logic of the web application, this paper presents BLProM, Business-Layer Process Miner, the black-box approach that identifies business processes of the web application. Detecting business processes of the web applications can be used in dynamic security testing to identify business logic vulnerabilities in web applications. BLProM first extracts the navigation graph of the web application then identifies business processes from the navigation graph. The evaluation conducted on three well-known open-source web applications shows that BLProM can detect business logic processes. Experimental results show that BLProM improves web application scanning because it clusters web application pages and prevents scanning similar pages. The proposed approach is compared to OWASP ZAP, an open-source web scanner. We show that BLProM improves web application scanning about %96. c 2019 JComSec. All rights reserved. 1 Introduction Most of the vulnerabilities reported in the Common Vulnerabilities and Exposures database [1] are related to the web application vulnerabilities. The number of security breaches increased by %35.5 in 2015 compared to last year [2]. Business logic vulnerabilities affect web application security as the most potent vulnerabilities. So far, there are subtle vulnerabilities related to the web application logic that are still discovered * Corresponding author. Email addresses: [email protected] (M. Alidoosti), [email protected] (A. Nowroozi), [email protected] (A. Nickabadi) https://dx.doi.org/10.22108/jcs.2020.117223.1028 ISSN: 2322-4460 c 2019 JComSec. All rights reserved. manually. Automated scanners cannot detect business logic flaws in applications because scanners are not able to understand the context [3]. Such vulnerabilities can only be detected through manual testing and be relied on tester creativity and skills [4]. There is no formal definition for business logic vul- nerabilities [3]. It is very difficult to detect business logic vulnerabilities and this type of vulnerability causes serious damage in case of misuse [3]. Under- standing context is difficult for automated tools, so penetration testers are responsible for detecting busi- ness logic vulnerabilities. Since business logic vulner- abilities are application-specific, it’s hard to detect these types of vulnerabilities. Web applications do not have any formal documen- tation describing their internal states and expected
Transcript
Page 1: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

Journal of Computing and Security

July 2019 Volume 6 Number 2 (pp 65ndash80)

httpwwwjcomsecorg

BLProMABlack-BoxApproach for Detecting Business-Layer

Processes in theWebApplications

Mitra Alidoosti a Alireza Nowroozi blowast Ahmad Nickabadi c

aMalek-Ashtar University of technology Tehran IranbMalek-Ashtar University of technology Tehran IrancAmirkabir University of Tehran Tehran Iran

A R T I C L E I N F O

Article history

Received 24 May 2019

Revised 22 December 2019

Accepted 9 February 2020

Published Online 3 May 2020

KeywordsBusiness Layer Business Process

Navigation Graph

A B S T R A C T

Web application vulnerability scanners cannot detect business logic

vulnerabilities (vulnerabilities related to logic) because they are not able to

understand the business logic of the web application To identify the business

logic of the web application this paper presents BLProM Business-Layer

Process Miner the black-box approach that identifies business processes of

the web application Detecting business processes of the web applications can

be used in dynamic security testing to identify business logic vulnerabilities

in web applications BLProM first extracts the navigation graph of the web

application then identifies business processes from the navigation graph The

evaluation conducted on three well-known open-source web applications shows

that BLProM can detect business logic processes Experimental results show

that BLProM improves web application scanning because it clusters web

application pages and prevents scanning similar pages The proposed approach

is compared to OWASP ZAP an open-source web scanner We show that

BLProM improves web application scanning about 96

ccopy 2019 JComSec All rights reserved

1 Introduction

Most of the vulnerabilities reported in the CommonVulnerabilities and Exposures database [1] are relatedto the web application vulnerabilities The number ofsecurity breaches increased by 355 in 2015 comparedto last year [2] Business logic vulnerabilities affect webapplication security as the most potent vulnerabilities

So far there are subtle vulnerabilities related tothe web application logic that are still discovered

lowast Corresponding author

Email addresses Alidoostimutacir (M Alidoosti)

Nowroozimutacir (A Nowroozi) Nickabadiautacir

(A Nickabadi)

httpsdxdoiorg1022108jcs20201172231028

ISSN 2322-4460 ccopy 2019 JComSec All rights reserved

manually Automated scanners cannot detect businesslogic flaws in applications because scanners are notable to understand the context [3] Such vulnerabilitiescan only be detected through manual testing and berelied on tester creativity and skills [4]

There is no formal definition for business logic vul-nerabilities [3] It is very difficult to detect businesslogic vulnerabilities and this type of vulnerabilitycauses serious damage in case of misuse [3] Under-standing context is difficult for automated tools sopenetration testers are responsible for detecting busi-ness logic vulnerabilities Since business logic vulner-abilities are application-specific itrsquos hard to detectthese types of vulnerabilities

Web applications do not have any formal documen-tation describing their internal states and expected

66 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

user behavior The lack of such a document makes itdifficult to detect business logic vulnerabilities Forexample adding a specific item several times in ashopping cart is a common feature but repeated usageof discount code is a kind of business logic vulnerabil-ities A human easily understands the difference be-tween these two scenarios while a scanner without anappropriate model of the web application cannot dis-tinguish between these two scenarios [4] Researches[5ndash7] have been conducted to automatically detectbusiness logic vulnerabilities but they are used forsmall applications Also the application source codeis required to generate the appropriate model of theapplication [4] So nowadays an automatic tool thatcan finds business logic vulnerabilities is required

In this paper we propose BLProM a black-boxtechnique for detecting web application business layerBLProM By identifying business processes in the webapplication provides the ability to identify businesslayer vulnerabilities In other words to dynamic se-curity testing of the web application in the businesslayer it is necessary first to identify the business pro-cesses of the web application (detecting business logicof the web applications) Then by analyzing the pro-cesses the business layer vulnerabilities are identifiedThe BLProM output is the Web application businessprocesses that are used as input in dynamic securitytesting in the business layer The proposed approachis independent of the technology used in web applica-tions and automatically finds business processes Alsowe will show that BLProM improves web applicationscanning because it detects similar pages in the webapplication and prevents scanning of similar pagesComparing the results of the web application scanningbetween BLProM and OWASP ZAP (web applicationopen source scanner) shows that BLProM improvesweb application scanning by about 96 In summarythis paper makes the following contributions

bull We present BLProM a black-box technique fordetecting web application business-layer

bull We present a new black-box approach for cluster-ing web pages

bull We show that web application scanning improvedabout 96 by identifying web application busi-ness processes

2 BACKGROUND and RELATEDWORK

The business layer determines the business logic of theweb applications The business layer is responsible fordata processing and data management and specifiesbusiness logic policies and rules Besides this layer val-idates the input data Figure 1 shows the three-layer

architecture of a web application and the position ofthe business layer in the web application The presen-tation layer is a user interface that displays data tothe user and receives inputs from the user In a webapplication this is the part that receives the HTTPrequest and returns the HTML response

The business layer handles data validation and busi-ness rules The data access layer communicates withthe database by constructing SQL queries

After receiving data from the user the data is avail-able to the business layer The web application usesthe data to run business processes Every business pro-cess has several steps that should be implemented re-spectively and processes may interact with each other

The business layer specifies the logic of the webapplication Business logic vulnerability is a defect inthe business layer A business logic attack vector isa legitimate request (usually multiple requests) andhas legitimate input values that abuse a modulersquosfunctionality to inflict damage and direct damage tothe business

21 Business Logic Attacks

There are two approaches to prevent business logicattacks 1) identifying attacks at runtime (defenseapproach) and 2) identifying logic vulnerabilities inthe web applications (prevention approach) In thedefense approach the behavior of the web applicationis monitored and an attack is reported whenever theweb application exits from the normal state In theprevention approach attack vectors are used to identifybusiness logic vulnerabilities

BLOCK [9] and Swaddler [10] use a defense ap-proach to prevent business logic attacks BLOCK Firstobtains the behavioral model of the web applicationby observing the interaction of the clients and theweb application It extracts a set of constants fromthe requestresponse sequence and session variablesBLOCK Identifies any request or response that vio-lates identified constants as an attack

Swaddler provides an anomaly detection methodfor detecting attacks It uses anomaly in the internalstate of the web application to detect vulnerabilitiesIn other words the web applicationrsquos internal state ismonitored in the learning phase and the normal valuesof the web application state are extracted which definethe web application profile Then in the detectionphase abnormal states are identified

MiMoSA [5] uses a prevention approach in the formof a white-box approach to identify business logic vul-nerabilities MiMoSA first provides a web applicationmodel based on the web applicationrsquos state and work-

July 2019 Volume 6 Number 2 (pp 65ndash80) 67

Figure 1 The Three-Layer Architecture of a Web Application [8]

flow MiMoSA detects multi-step attacks by analyz-ing the relationship between the web application andthe database as well as the connections in the webapplication

SENTINEL [11] and Pellegrino [3] use a preventionapproach to identify business logic vulnerabilities inthe form of a black-box approach

SENTINEL [11] is a black-box approach for detect-ing logical weaknesses in database access SENTINELgenerates a state machine of the web and extracts a setof invariants from observed SQL queries and responsesand session variables as the application specificationAny SQL query that violates defined invariants isidentified as an attack

Pellegrino et al [3] propose a black-box techniqueto detect logic vulnerabilities in web applications Thistechnique extracts behavioral patterns from networktraces in which the user interacts with a certain ap-plicationrsquos functionality First the web application ismodeled and then attack vectors are applied to themodel

Our previous works BLDAST [12 13] andBLTOCTTOU [14] use a prevention approach toidentify business logic vulnerabilities in the form of ablack-box approach BLProM [15] can be used as aninput for BLTOCTTOU and BLDAST

BLDAST [12 13] is a dynamic and a black-box vul-nerability analysis approach that identifies businesslogic vulnerabilities of a web application against flood-ing DoS attacks BLDAST assesses web applicationresiliency against flooding DoS attacks It can takeinto account the business processes of a web applica-tion BLDAST selects critical pages in business pro-cesses A critical page has considerable response timeTherefore a critical process can enforce heavy loadinto the target and lead the web server to become un-responsive The goal of the BLDAST is to find thesecritical processes within the web applications

BLTOCTTOU [14] is a black-box dynamic applica-tion security tester for detecting business logic vulnera-bilities against race condition attacks BLTOCTTOUidentifies vulnerabilities with the help of finding thebusiness processes of the web application BLTOCT-TOU detects business processes that interact witheach other one process should set the value of a vari-

able and the other should read or write that variableTo identify the race condition BLTOCTTOU firstexecutes identified processes sequentially and then ex-ecutes them in reverse order At last it evaluates theoutputs of these two modes If they are different theweb application is vulnerable to a race condition

22 Clustering Web Pages

Crescenzi [16] presented an approach to cluster webpages based on the page structure The structuralsimilarity between web pages is defined by DOM treesof their hyperlinks The final clusters are used tobuild a model that describes the structure of the siteaccording to classes of pages and their connectivity

3 BUSINESS-LAYER PROCESSMINER

In this paper the BLProM is proposed to identifybusiness processes of the web application and we useits outputs as the input in the web application securitytesting in the business layer Then by analyzing theinteraction between business processes business layervulnerabilities can be detected

BLProM first preprocesses normal user HTTP traf-fic Then extracts web application pages in the trafficBLProM clusters similar pages to prevent the infinitegrowth of the user navigation graph Detected clustersare graph nodes and the graph edges are the relationsbetween detected clusters Based on detected nodesand edges the user navigation graph is extractedThen BLProM extracts business processes from thenavigation graph BLProM has two main steps

1 Extracting user navigation graph2 Detecting business processes in the web applica-

tion

Figure 2 shows the proposed steps to identify thebusiness process in the web application In the follow-ing we will explain each of these steps in detail

31 Extracting the User Navigation Graph

First the normal user starts to crawl the web appli-cation The traffic of a normal user is captured andstored It should be noted that the user permission

68 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Figure 2 Black-Box Approach to Detect Web Application Business Process

level can be different according to the role of the userand consequently the user navigation graph varies de-pending on the permission level In this paper thenormal user has the user-level permission and searchesthrough related permissible pages The normal usercrawls all different parts of the application that areallowed to search

BLProM initially extracts the user navigation graphfrom the stored traffic this is performed through stepsas follows

1 Preprocessing of raw input data2 Identifying existing web application pages in the

stored traffic3 Clustering the web application pages4 Extracting the user navigation graph

1) Preprocessing of raw input dataIn the preprocessing step the BLProM cleans thedata and removes irrelevant data samples In thispaper only HTTP requests and responses are usedFor the responses only those with successful statuscodes 200 and 209 are employed The BLProMremoves responses with failure status codes aswell as their corresponding requests Additionallyin this paper only GET and POST requests areneeded and the remaining ones are discarded

2) Identifying the web application pages in thestored traffic

2) Identifying the web application pages in the storedtrafficEach page of the application can be representedas a pair (main request corresponding responseto the main request) To identify the web applica-

tion pages it is first necessary to detect the mainrequests in the traffic After identifying the mainrequests the corresponding responses must alsobe identified

Identifying the main HTTP requests inthe traffic In the userrsquos stored traffic there areboth the main HTTP requests that lead to load-ing the web application pages and the secondaryHTTP requests that are responsible to load a fileimage etc of the page The BLProM must distin-guish between the main and secondary requests Inother words when the main requests are identifiedthe remaining ones are considered as secondaryrequests In this way any request that its Refererheader is different from the Referer of the previ-ous request is considered as the secondary requestsand the previous one is the main request Addition-ally the first request in the traffic is consideredthe main request because the first request does notinclude the Referer It should be noted that theReferer header field in the HTTP request indicatesthe URL of the previous page the user visited

Identifying the corresponding responsesto the main requests To identify the corre-sponding responses to the main requests it is onlyneeded to select the responses that their content-type field is texthtml Because the correspond-ing response to the secondary requests is often afile photo etc while the corresponding responseto the main requests is in the form of text andHTML Such responses are the main responses inthe HTTP traffic After identifying the main re-quests and responses in the traffic each pair (mainrequest corresponding responses to the main re-quest) indicates a page of the web application It

July 2019 Volume 6 Number 2 (pp 65ndash80) 69

Figure 3 Number of Consecutive Requests

should be noted that in responses of the main re-quests all secondary requests exist that lead toload the web pageIdentifyingwhether the last request in the

traffic is the main request or not If the lastresponse in the traffic is the main response thelast HTTP request is the main request as well

For example in Figure 3 requests 1 2 6 and 8are the main requests that are shown in red

The pseudocode of the algorithm used by the BL-ProM to identify web application pages is shownin Algorithm 1 As mentioned before each webapplication page can be indicated by a pair (mainrequest corresponding responses to the main re-quest) The pseudocode in Algorithm 1 can bedivided into four main parts1 Extracting main requests in the traffic (line 13-

26)2 Extracting corresponding responses to the main

requests (line 27-33)3 Identifying whether the last request in the traf-

fic is the main request or not (line 34-37)4 Extracting web application pages (lines 38-43)

In line 9 first all requests in the traffic eitherthe main requests or secondary ones are extractedand put in the HTTPRequest variable In line 10existing responses in the traffic are added to theHTTPResponse variable In lines 11 and 12 thetotal number of requests and responses is calcu-lated In line 15 it is checked whether the currentrequest is the first request if yes it is considered asthe main request and included in the MainRequestvariable In line 21 it is checked whether the Ref-erer field of the current request is different fromthe Referer field of the previous request if yes theprevious request is considered as the main request

In line 29 it is checked whether the content-type field of the current response is texthtml Ifthe given condition is met the current request isconsidered as the main request

In line 35 it is checked whether the last HTTPresponse is the main response if yes the last re-quest is considered as the main request as well

In line 41 all main requests and responses arethe web application pages which are shown as pairs(request response)

3) Clustering the web application pagesIn this step BLProM clusters the web applicationpages Clustering aims to put similar pages in thesame cluster This is helpful to prevent the infinitegrowth of the user navigation graph The pages inthe same cluster are similar to each other

At this stage all pages in the userrsquos stored traffichave been extracted as pairs (main request cor-responding response to the main request) Eachpair (main request corresponding response to themain request) shows one page of the applicationTo extract the optimal user navigation graph thesimilar extracted pages must be identified and clus-tered In the user navigation graph nodes indicatethe applicationrsquos unique pages and edges repre-sent the link between the pages To identify similarpages a criterion should be considered by the pur-pose of the clustering The type of operations thatthe user can perform on the page is considered asa measure for the separation of pages In otherwords two pages are similar if the user can per-form the same operations on them For exampleconsider two pages such that both contain onlya button but the title of the buttons is differentwhere in the first page the title is ldquocontinuerdquo andfor the second one it is ldquosaverdquo These two pagesare different because the user performs differentoperations on them Thus according to the crite-ria specified for the similar pages the followingpages are considered similar in web applicationsbull In the online shop application if the pages

related to the shopping cart of goods even ifthey contain different items they are consid-ered similar pagesbull In the online shop application the profile

pages of each product are similar becausethey have the same HTML structure in termsof the important HTML elementsbull Pages that display search results with different

keywords are similarbull If the structure of pages is a subset of another

page in terms of the important HTML ele-ments these pages are considered similarbull Two pages that both contain user comments

are considered similar even if the contents areabout different products

70 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 1 The Pseudocode for Extracting Web Application Pages From the Traffic

INPUT HITRtrafficHttpMessage1 HttpMessage2 HttpMessages3 HithMessagenOUTPUT WebPages as a set of (Requesti Respensei)

1 Begin2 let JWebPages = set of web pages3 let HttpRequest= Iset of HTTP Requests4 Iet HittpResponse = set of HTTP Responses5 let MainRequest= set of main HTTP Requests6 let MainResponse = set of main HTTP Responses7 let i k= 1 counter for current HttpMessage8 let LastReferer = empty NewRefere = empty9 HttpRequest=ExtractReg(HITRtraffic)extract HTTP Requests from HTTP traffic

10 HttpRespense - ExtractResp(HITRtraffic)extract HTTP Response from HTTP traffic11 n = extractNumber(HttpRequest) extract total number of HTTP Requests12 m = extractNumber(HttpResponse) extract total number of HTTP Responses13 extract Main HTTP Request from HTTPRequests14 for i 1 n do15 if (i = 1) then16 add MainRequestlarr HttpRequesti First Request is a Main Request17 LastReferer larr Referer of HttpRequesti18 else19 NewReferelarr Referer of HttpRequest120 end if21 if (NewReferer 6= LastReferer) then22 add MainRequestlarr HttpRequestiminus123 LastReferer larr NewRefere24 end if25 end for26 extract Main HTTP Response from HTTPResponse27 for k 1 m do28 if (content-type of HTTPResponsek = texthtml) then29 addMainRespenselarr HttpRespensek30 end if31 end for32 checking last request is main request or not33 if (HTTPResponsemisinMainResponse) then34 addMainRequestlarr HTTPRequestm35 end if36 extract set of web pages37 size = extractNumber(MainRequest) extract total number of Main HTTP Requests38 for j 1 size do39 WebPageslarr (MainRequestj)MainResponsej)40 end for41 return WebRages42 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 71

Table 1 Attribute Vector of a Page in osComerce Web Application

inputs null

Buttons

htmlbodybuttonReviews

htmlbodydivdivdivdivformdivdivspans

panbuttonAdd to Cart

anchors null

image htmlbodydivdivdivaimg

Figure 4 An Example of HTML Code

Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code

For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton

Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors

The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages

In the following these steps are discussed indetail1 Extracting attributes vectors of the page

The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector

WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =

prodni DOM(inputi)

DOMbuttons=prodn

i DOM(buttonsi)DOManchors=

prodni DOM(anchori)

DOMimgs=prodn

i DOM(imgi)

bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button

in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in

the pagebull DOM(img) DOM path of the existing im-

age in the pageSuppose the web application page contains

several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags

2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages

According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-

responding elements in the vector of page2bull All vector elements of page 1 are a subset

of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset

of corresponding elements in the vector ofpage 1

1 httpswwwoscommercecom

72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Figure 5 A Page of osCommerce Application

Algorithm 2 The Pseudocode for Identifying the Similar Pages

INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same

1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then

10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end

bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-

mentSimilar pages are identified according to the

above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages

3 Clustering web application pagesAfter identifying the similar pages they are

put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster

4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI

July 2019 Volume 6 Number 2 (pp 65ndash80) 73

Algorithm 3 The Pseudocode for Clustering the Web Application Pages

INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages

1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi

6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages

10 Ck larr wj

11 end if12 end for13 k + +14 end for15 return C16 end

of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph

In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1

rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages

Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph

Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph

32 Identifying Business Processes in the Ap-plication

To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess

Definition 3 [The Application Process (P)] The

74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph

INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges

1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj

6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj

7 end for8 for i k do9 for j i+ 1 k do

10 if (URIci capReferercj 6= null) then11 E larr E + CiCj

12 end if13 end for14 end for15 return E16 end

Algorithm 5 The Pseudocode for Extracting the User Navigation Graph

INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt

1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck

7 end if8 end for9 return C0 CE

10 end

process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci

Algorithm 6 shows the pseudocode for extractingprocesses

Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there

The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-

fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button

Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions

1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)

2 If the process passes its first node again it means

July 2019 Volume 6 Number 2 (pp 65ndash80) 75

Algorithm 6 The Pseudocode for Extracting Processes

INPUT the web application first node C0

the web application Graph edges EOUTPUT the web application graph process P as a set of web application process

1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0

4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0

5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E

10 end if11 end

the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)

All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process

4 EXPERIMENTAL RESULTS

The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2

The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications

The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input

5 EVALUATION

BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer

We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same

To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria

bull True Positive Samples that fit well into theircorrect clusters

bull False Positive Samples that fit in a cluster thatdo not belong to that cluster

bull False Negative Samples that do not fit in a clusterbut they belong to that cluster

bull Recall It is calculated by the following formula

recall = TruePositiveTruePositive+FalseNegative

bull Precision It is calculated by the following for-mula

76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 2 Testbed Profiles

Web server (test target)

CPU Pentium dual core-220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (BLProM machine)

CPU Intel corei7 220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (legal user)

CPU Pentium dual core i3-3210 GHZ

OS windows 7

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Table 3 Selected Web Applications for Evaluation

Web application Description

TomatoCart-11861 e-commerce

osCommerce-234 e-commerce

WackoPicko Web application for Sharing picture

Algorithm 7 The Pseudocode for Identifying the Application Business Processes

INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process

1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk

9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj

15 end if16 end for17 return BP18 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 77

Table 4 The Clusters of Selected Web Application Pages Evaluation

Criteria

Web applicationWackoPicko Tomatocart osCommerce

samples 89 150 210

clusters 29 66 40

true positive 65 146 205

false positive 24 4 5

false negative 23 3 4

recall 074 098 098

precision 073 097 098

f-measure 073 098 098

Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko

WackoPicko

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 89 89 ndash

Graph Nodes 22 89 752

Graph Edges 48 270 822

process (P) 12 48 75

Average edge in each process (E) 4 46 913

Average edges in all processes (PE) 48 2208 978

business processes 10 NA ndash

precision = TruePositiveTruePositive+FalsePositive

bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision

TruePositive+FalsePositive

The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application

To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in

Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application

Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities

BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 2: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

66 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

user behavior The lack of such a document makes itdifficult to detect business logic vulnerabilities Forexample adding a specific item several times in ashopping cart is a common feature but repeated usageof discount code is a kind of business logic vulnerabil-ities A human easily understands the difference be-tween these two scenarios while a scanner without anappropriate model of the web application cannot dis-tinguish between these two scenarios [4] Researches[5ndash7] have been conducted to automatically detectbusiness logic vulnerabilities but they are used forsmall applications Also the application source codeis required to generate the appropriate model of theapplication [4] So nowadays an automatic tool thatcan finds business logic vulnerabilities is required

In this paper we propose BLProM a black-boxtechnique for detecting web application business layerBLProM By identifying business processes in the webapplication provides the ability to identify businesslayer vulnerabilities In other words to dynamic se-curity testing of the web application in the businesslayer it is necessary first to identify the business pro-cesses of the web application (detecting business logicof the web applications) Then by analyzing the pro-cesses the business layer vulnerabilities are identifiedThe BLProM output is the Web application businessprocesses that are used as input in dynamic securitytesting in the business layer The proposed approachis independent of the technology used in web applica-tions and automatically finds business processes Alsowe will show that BLProM improves web applicationscanning because it detects similar pages in the webapplication and prevents scanning of similar pagesComparing the results of the web application scanningbetween BLProM and OWASP ZAP (web applicationopen source scanner) shows that BLProM improvesweb application scanning by about 96 In summarythis paper makes the following contributions

bull We present BLProM a black-box technique fordetecting web application business-layer

bull We present a new black-box approach for cluster-ing web pages

bull We show that web application scanning improvedabout 96 by identifying web application busi-ness processes

2 BACKGROUND and RELATEDWORK

The business layer determines the business logic of theweb applications The business layer is responsible fordata processing and data management and specifiesbusiness logic policies and rules Besides this layer val-idates the input data Figure 1 shows the three-layer

architecture of a web application and the position ofthe business layer in the web application The presen-tation layer is a user interface that displays data tothe user and receives inputs from the user In a webapplication this is the part that receives the HTTPrequest and returns the HTML response

The business layer handles data validation and busi-ness rules The data access layer communicates withthe database by constructing SQL queries

After receiving data from the user the data is avail-able to the business layer The web application usesthe data to run business processes Every business pro-cess has several steps that should be implemented re-spectively and processes may interact with each other

The business layer specifies the logic of the webapplication Business logic vulnerability is a defect inthe business layer A business logic attack vector isa legitimate request (usually multiple requests) andhas legitimate input values that abuse a modulersquosfunctionality to inflict damage and direct damage tothe business

21 Business Logic Attacks

There are two approaches to prevent business logicattacks 1) identifying attacks at runtime (defenseapproach) and 2) identifying logic vulnerabilities inthe web applications (prevention approach) In thedefense approach the behavior of the web applicationis monitored and an attack is reported whenever theweb application exits from the normal state In theprevention approach attack vectors are used to identifybusiness logic vulnerabilities

BLOCK [9] and Swaddler [10] use a defense ap-proach to prevent business logic attacks BLOCK Firstobtains the behavioral model of the web applicationby observing the interaction of the clients and theweb application It extracts a set of constants fromthe requestresponse sequence and session variablesBLOCK Identifies any request or response that vio-lates identified constants as an attack

Swaddler provides an anomaly detection methodfor detecting attacks It uses anomaly in the internalstate of the web application to detect vulnerabilitiesIn other words the web applicationrsquos internal state ismonitored in the learning phase and the normal valuesof the web application state are extracted which definethe web application profile Then in the detectionphase abnormal states are identified

MiMoSA [5] uses a prevention approach in the formof a white-box approach to identify business logic vul-nerabilities MiMoSA first provides a web applicationmodel based on the web applicationrsquos state and work-

July 2019 Volume 6 Number 2 (pp 65ndash80) 67

Figure 1 The Three-Layer Architecture of a Web Application [8]

flow MiMoSA detects multi-step attacks by analyz-ing the relationship between the web application andthe database as well as the connections in the webapplication

SENTINEL [11] and Pellegrino [3] use a preventionapproach to identify business logic vulnerabilities inthe form of a black-box approach

SENTINEL [11] is a black-box approach for detect-ing logical weaknesses in database access SENTINELgenerates a state machine of the web and extracts a setof invariants from observed SQL queries and responsesand session variables as the application specificationAny SQL query that violates defined invariants isidentified as an attack

Pellegrino et al [3] propose a black-box techniqueto detect logic vulnerabilities in web applications Thistechnique extracts behavioral patterns from networktraces in which the user interacts with a certain ap-plicationrsquos functionality First the web application ismodeled and then attack vectors are applied to themodel

Our previous works BLDAST [12 13] andBLTOCTTOU [14] use a prevention approach toidentify business logic vulnerabilities in the form of ablack-box approach BLProM [15] can be used as aninput for BLTOCTTOU and BLDAST

BLDAST [12 13] is a dynamic and a black-box vul-nerability analysis approach that identifies businesslogic vulnerabilities of a web application against flood-ing DoS attacks BLDAST assesses web applicationresiliency against flooding DoS attacks It can takeinto account the business processes of a web applica-tion BLDAST selects critical pages in business pro-cesses A critical page has considerable response timeTherefore a critical process can enforce heavy loadinto the target and lead the web server to become un-responsive The goal of the BLDAST is to find thesecritical processes within the web applications

BLTOCTTOU [14] is a black-box dynamic applica-tion security tester for detecting business logic vulnera-bilities against race condition attacks BLTOCTTOUidentifies vulnerabilities with the help of finding thebusiness processes of the web application BLTOCT-TOU detects business processes that interact witheach other one process should set the value of a vari-

able and the other should read or write that variableTo identify the race condition BLTOCTTOU firstexecutes identified processes sequentially and then ex-ecutes them in reverse order At last it evaluates theoutputs of these two modes If they are different theweb application is vulnerable to a race condition

22 Clustering Web Pages

Crescenzi [16] presented an approach to cluster webpages based on the page structure The structuralsimilarity between web pages is defined by DOM treesof their hyperlinks The final clusters are used tobuild a model that describes the structure of the siteaccording to classes of pages and their connectivity

3 BUSINESS-LAYER PROCESSMINER

In this paper the BLProM is proposed to identifybusiness processes of the web application and we useits outputs as the input in the web application securitytesting in the business layer Then by analyzing theinteraction between business processes business layervulnerabilities can be detected

BLProM first preprocesses normal user HTTP traf-fic Then extracts web application pages in the trafficBLProM clusters similar pages to prevent the infinitegrowth of the user navigation graph Detected clustersare graph nodes and the graph edges are the relationsbetween detected clusters Based on detected nodesand edges the user navigation graph is extractedThen BLProM extracts business processes from thenavigation graph BLProM has two main steps

1 Extracting user navigation graph2 Detecting business processes in the web applica-

tion

Figure 2 shows the proposed steps to identify thebusiness process in the web application In the follow-ing we will explain each of these steps in detail

31 Extracting the User Navigation Graph

First the normal user starts to crawl the web appli-cation The traffic of a normal user is captured andstored It should be noted that the user permission

68 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Figure 2 Black-Box Approach to Detect Web Application Business Process

level can be different according to the role of the userand consequently the user navigation graph varies de-pending on the permission level In this paper thenormal user has the user-level permission and searchesthrough related permissible pages The normal usercrawls all different parts of the application that areallowed to search

BLProM initially extracts the user navigation graphfrom the stored traffic this is performed through stepsas follows

1 Preprocessing of raw input data2 Identifying existing web application pages in the

stored traffic3 Clustering the web application pages4 Extracting the user navigation graph

1) Preprocessing of raw input dataIn the preprocessing step the BLProM cleans thedata and removes irrelevant data samples In thispaper only HTTP requests and responses are usedFor the responses only those with successful statuscodes 200 and 209 are employed The BLProMremoves responses with failure status codes aswell as their corresponding requests Additionallyin this paper only GET and POST requests areneeded and the remaining ones are discarded

2) Identifying the web application pages in thestored traffic

2) Identifying the web application pages in the storedtrafficEach page of the application can be representedas a pair (main request corresponding responseto the main request) To identify the web applica-

tion pages it is first necessary to detect the mainrequests in the traffic After identifying the mainrequests the corresponding responses must alsobe identified

Identifying the main HTTP requests inthe traffic In the userrsquos stored traffic there areboth the main HTTP requests that lead to load-ing the web application pages and the secondaryHTTP requests that are responsible to load a fileimage etc of the page The BLProM must distin-guish between the main and secondary requests Inother words when the main requests are identifiedthe remaining ones are considered as secondaryrequests In this way any request that its Refererheader is different from the Referer of the previ-ous request is considered as the secondary requestsand the previous one is the main request Addition-ally the first request in the traffic is consideredthe main request because the first request does notinclude the Referer It should be noted that theReferer header field in the HTTP request indicatesthe URL of the previous page the user visited

Identifying the corresponding responsesto the main requests To identify the corre-sponding responses to the main requests it is onlyneeded to select the responses that their content-type field is texthtml Because the correspond-ing response to the secondary requests is often afile photo etc while the corresponding responseto the main requests is in the form of text andHTML Such responses are the main responses inthe HTTP traffic After identifying the main re-quests and responses in the traffic each pair (mainrequest corresponding responses to the main re-quest) indicates a page of the web application It

July 2019 Volume 6 Number 2 (pp 65ndash80) 69

Figure 3 Number of Consecutive Requests

should be noted that in responses of the main re-quests all secondary requests exist that lead toload the web pageIdentifyingwhether the last request in the

traffic is the main request or not If the lastresponse in the traffic is the main response thelast HTTP request is the main request as well

For example in Figure 3 requests 1 2 6 and 8are the main requests that are shown in red

The pseudocode of the algorithm used by the BL-ProM to identify web application pages is shownin Algorithm 1 As mentioned before each webapplication page can be indicated by a pair (mainrequest corresponding responses to the main re-quest) The pseudocode in Algorithm 1 can bedivided into four main parts1 Extracting main requests in the traffic (line 13-

26)2 Extracting corresponding responses to the main

requests (line 27-33)3 Identifying whether the last request in the traf-

fic is the main request or not (line 34-37)4 Extracting web application pages (lines 38-43)

In line 9 first all requests in the traffic eitherthe main requests or secondary ones are extractedand put in the HTTPRequest variable In line 10existing responses in the traffic are added to theHTTPResponse variable In lines 11 and 12 thetotal number of requests and responses is calcu-lated In line 15 it is checked whether the currentrequest is the first request if yes it is considered asthe main request and included in the MainRequestvariable In line 21 it is checked whether the Ref-erer field of the current request is different fromthe Referer field of the previous request if yes theprevious request is considered as the main request

In line 29 it is checked whether the content-type field of the current response is texthtml Ifthe given condition is met the current request isconsidered as the main request

In line 35 it is checked whether the last HTTPresponse is the main response if yes the last re-quest is considered as the main request as well

In line 41 all main requests and responses arethe web application pages which are shown as pairs(request response)

3) Clustering the web application pagesIn this step BLProM clusters the web applicationpages Clustering aims to put similar pages in thesame cluster This is helpful to prevent the infinitegrowth of the user navigation graph The pages inthe same cluster are similar to each other

At this stage all pages in the userrsquos stored traffichave been extracted as pairs (main request cor-responding response to the main request) Eachpair (main request corresponding response to themain request) shows one page of the applicationTo extract the optimal user navigation graph thesimilar extracted pages must be identified and clus-tered In the user navigation graph nodes indicatethe applicationrsquos unique pages and edges repre-sent the link between the pages To identify similarpages a criterion should be considered by the pur-pose of the clustering The type of operations thatthe user can perform on the page is considered asa measure for the separation of pages In otherwords two pages are similar if the user can per-form the same operations on them For exampleconsider two pages such that both contain onlya button but the title of the buttons is differentwhere in the first page the title is ldquocontinuerdquo andfor the second one it is ldquosaverdquo These two pagesare different because the user performs differentoperations on them Thus according to the crite-ria specified for the similar pages the followingpages are considered similar in web applicationsbull In the online shop application if the pages

related to the shopping cart of goods even ifthey contain different items they are consid-ered similar pagesbull In the online shop application the profile

pages of each product are similar becausethey have the same HTML structure in termsof the important HTML elementsbull Pages that display search results with different

keywords are similarbull If the structure of pages is a subset of another

page in terms of the important HTML ele-ments these pages are considered similarbull Two pages that both contain user comments

are considered similar even if the contents areabout different products

70 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 1 The Pseudocode for Extracting Web Application Pages From the Traffic

INPUT HITRtrafficHttpMessage1 HttpMessage2 HttpMessages3 HithMessagenOUTPUT WebPages as a set of (Requesti Respensei)

1 Begin2 let JWebPages = set of web pages3 let HttpRequest= Iset of HTTP Requests4 Iet HittpResponse = set of HTTP Responses5 let MainRequest= set of main HTTP Requests6 let MainResponse = set of main HTTP Responses7 let i k= 1 counter for current HttpMessage8 let LastReferer = empty NewRefere = empty9 HttpRequest=ExtractReg(HITRtraffic)extract HTTP Requests from HTTP traffic

10 HttpRespense - ExtractResp(HITRtraffic)extract HTTP Response from HTTP traffic11 n = extractNumber(HttpRequest) extract total number of HTTP Requests12 m = extractNumber(HttpResponse) extract total number of HTTP Responses13 extract Main HTTP Request from HTTPRequests14 for i 1 n do15 if (i = 1) then16 add MainRequestlarr HttpRequesti First Request is a Main Request17 LastReferer larr Referer of HttpRequesti18 else19 NewReferelarr Referer of HttpRequest120 end if21 if (NewReferer 6= LastReferer) then22 add MainRequestlarr HttpRequestiminus123 LastReferer larr NewRefere24 end if25 end for26 extract Main HTTP Response from HTTPResponse27 for k 1 m do28 if (content-type of HTTPResponsek = texthtml) then29 addMainRespenselarr HttpRespensek30 end if31 end for32 checking last request is main request or not33 if (HTTPResponsemisinMainResponse) then34 addMainRequestlarr HTTPRequestm35 end if36 extract set of web pages37 size = extractNumber(MainRequest) extract total number of Main HTTP Requests38 for j 1 size do39 WebPageslarr (MainRequestj)MainResponsej)40 end for41 return WebRages42 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 71

Table 1 Attribute Vector of a Page in osComerce Web Application

inputs null

Buttons

htmlbodybuttonReviews

htmlbodydivdivdivdivformdivdivspans

panbuttonAdd to Cart

anchors null

image htmlbodydivdivdivaimg

Figure 4 An Example of HTML Code

Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code

For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton

Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors

The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages

In the following these steps are discussed indetail1 Extracting attributes vectors of the page

The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector

WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =

prodni DOM(inputi)

DOMbuttons=prodn

i DOM(buttonsi)DOManchors=

prodni DOM(anchori)

DOMimgs=prodn

i DOM(imgi)

bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button

in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in

the pagebull DOM(img) DOM path of the existing im-

age in the pageSuppose the web application page contains

several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags

2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages

According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-

responding elements in the vector of page2bull All vector elements of page 1 are a subset

of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset

of corresponding elements in the vector ofpage 1

1 httpswwwoscommercecom

72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Figure 5 A Page of osCommerce Application

Algorithm 2 The Pseudocode for Identifying the Similar Pages

INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same

1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then

10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end

bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-

mentSimilar pages are identified according to the

above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages

3 Clustering web application pagesAfter identifying the similar pages they are

put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster

4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI

July 2019 Volume 6 Number 2 (pp 65ndash80) 73

Algorithm 3 The Pseudocode for Clustering the Web Application Pages

INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages

1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi

6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages

10 Ck larr wj

11 end if12 end for13 k + +14 end for15 return C16 end

of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph

In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1

rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages

Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph

Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph

32 Identifying Business Processes in the Ap-plication

To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess

Definition 3 [The Application Process (P)] The

74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph

INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges

1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj

6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj

7 end for8 for i k do9 for j i+ 1 k do

10 if (URIci capReferercj 6= null) then11 E larr E + CiCj

12 end if13 end for14 end for15 return E16 end

Algorithm 5 The Pseudocode for Extracting the User Navigation Graph

INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt

1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck

7 end if8 end for9 return C0 CE

10 end

process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci

Algorithm 6 shows the pseudocode for extractingprocesses

Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there

The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-

fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button

Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions

1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)

2 If the process passes its first node again it means

July 2019 Volume 6 Number 2 (pp 65ndash80) 75

Algorithm 6 The Pseudocode for Extracting Processes

INPUT the web application first node C0

the web application Graph edges EOUTPUT the web application graph process P as a set of web application process

1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0

4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0

5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E

10 end if11 end

the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)

All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process

4 EXPERIMENTAL RESULTS

The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2

The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications

The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input

5 EVALUATION

BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer

We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same

To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria

bull True Positive Samples that fit well into theircorrect clusters

bull False Positive Samples that fit in a cluster thatdo not belong to that cluster

bull False Negative Samples that do not fit in a clusterbut they belong to that cluster

bull Recall It is calculated by the following formula

recall = TruePositiveTruePositive+FalseNegative

bull Precision It is calculated by the following for-mula

76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 2 Testbed Profiles

Web server (test target)

CPU Pentium dual core-220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (BLProM machine)

CPU Intel corei7 220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (legal user)

CPU Pentium dual core i3-3210 GHZ

OS windows 7

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Table 3 Selected Web Applications for Evaluation

Web application Description

TomatoCart-11861 e-commerce

osCommerce-234 e-commerce

WackoPicko Web application for Sharing picture

Algorithm 7 The Pseudocode for Identifying the Application Business Processes

INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process

1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk

9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj

15 end if16 end for17 return BP18 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 77

Table 4 The Clusters of Selected Web Application Pages Evaluation

Criteria

Web applicationWackoPicko Tomatocart osCommerce

samples 89 150 210

clusters 29 66 40

true positive 65 146 205

false positive 24 4 5

false negative 23 3 4

recall 074 098 098

precision 073 097 098

f-measure 073 098 098

Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko

WackoPicko

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 89 89 ndash

Graph Nodes 22 89 752

Graph Edges 48 270 822

process (P) 12 48 75

Average edge in each process (E) 4 46 913

Average edges in all processes (PE) 48 2208 978

business processes 10 NA ndash

precision = TruePositiveTruePositive+FalsePositive

bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision

TruePositive+FalsePositive

The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application

To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in

Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application

Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities

BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 3: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

July 2019 Volume 6 Number 2 (pp 65ndash80) 67

Figure 1 The Three-Layer Architecture of a Web Application [8]

flow MiMoSA detects multi-step attacks by analyz-ing the relationship between the web application andthe database as well as the connections in the webapplication

SENTINEL [11] and Pellegrino [3] use a preventionapproach to identify business logic vulnerabilities inthe form of a black-box approach

SENTINEL [11] is a black-box approach for detect-ing logical weaknesses in database access SENTINELgenerates a state machine of the web and extracts a setof invariants from observed SQL queries and responsesand session variables as the application specificationAny SQL query that violates defined invariants isidentified as an attack

Pellegrino et al [3] propose a black-box techniqueto detect logic vulnerabilities in web applications Thistechnique extracts behavioral patterns from networktraces in which the user interacts with a certain ap-plicationrsquos functionality First the web application ismodeled and then attack vectors are applied to themodel

Our previous works BLDAST [12 13] andBLTOCTTOU [14] use a prevention approach toidentify business logic vulnerabilities in the form of ablack-box approach BLProM [15] can be used as aninput for BLTOCTTOU and BLDAST

BLDAST [12 13] is a dynamic and a black-box vul-nerability analysis approach that identifies businesslogic vulnerabilities of a web application against flood-ing DoS attacks BLDAST assesses web applicationresiliency against flooding DoS attacks It can takeinto account the business processes of a web applica-tion BLDAST selects critical pages in business pro-cesses A critical page has considerable response timeTherefore a critical process can enforce heavy loadinto the target and lead the web server to become un-responsive The goal of the BLDAST is to find thesecritical processes within the web applications

BLTOCTTOU [14] is a black-box dynamic applica-tion security tester for detecting business logic vulnera-bilities against race condition attacks BLTOCTTOUidentifies vulnerabilities with the help of finding thebusiness processes of the web application BLTOCT-TOU detects business processes that interact witheach other one process should set the value of a vari-

able and the other should read or write that variableTo identify the race condition BLTOCTTOU firstexecutes identified processes sequentially and then ex-ecutes them in reverse order At last it evaluates theoutputs of these two modes If they are different theweb application is vulnerable to a race condition

22 Clustering Web Pages

Crescenzi [16] presented an approach to cluster webpages based on the page structure The structuralsimilarity between web pages is defined by DOM treesof their hyperlinks The final clusters are used tobuild a model that describes the structure of the siteaccording to classes of pages and their connectivity

3 BUSINESS-LAYER PROCESSMINER

In this paper the BLProM is proposed to identifybusiness processes of the web application and we useits outputs as the input in the web application securitytesting in the business layer Then by analyzing theinteraction between business processes business layervulnerabilities can be detected

BLProM first preprocesses normal user HTTP traf-fic Then extracts web application pages in the trafficBLProM clusters similar pages to prevent the infinitegrowth of the user navigation graph Detected clustersare graph nodes and the graph edges are the relationsbetween detected clusters Based on detected nodesand edges the user navigation graph is extractedThen BLProM extracts business processes from thenavigation graph BLProM has two main steps

1 Extracting user navigation graph2 Detecting business processes in the web applica-

tion

Figure 2 shows the proposed steps to identify thebusiness process in the web application In the follow-ing we will explain each of these steps in detail

31 Extracting the User Navigation Graph

First the normal user starts to crawl the web appli-cation The traffic of a normal user is captured andstored It should be noted that the user permission

68 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Figure 2 Black-Box Approach to Detect Web Application Business Process

level can be different according to the role of the userand consequently the user navigation graph varies de-pending on the permission level In this paper thenormal user has the user-level permission and searchesthrough related permissible pages The normal usercrawls all different parts of the application that areallowed to search

BLProM initially extracts the user navigation graphfrom the stored traffic this is performed through stepsas follows

1 Preprocessing of raw input data2 Identifying existing web application pages in the

stored traffic3 Clustering the web application pages4 Extracting the user navigation graph

1) Preprocessing of raw input dataIn the preprocessing step the BLProM cleans thedata and removes irrelevant data samples In thispaper only HTTP requests and responses are usedFor the responses only those with successful statuscodes 200 and 209 are employed The BLProMremoves responses with failure status codes aswell as their corresponding requests Additionallyin this paper only GET and POST requests areneeded and the remaining ones are discarded

2) Identifying the web application pages in thestored traffic

2) Identifying the web application pages in the storedtrafficEach page of the application can be representedas a pair (main request corresponding responseto the main request) To identify the web applica-

tion pages it is first necessary to detect the mainrequests in the traffic After identifying the mainrequests the corresponding responses must alsobe identified

Identifying the main HTTP requests inthe traffic In the userrsquos stored traffic there areboth the main HTTP requests that lead to load-ing the web application pages and the secondaryHTTP requests that are responsible to load a fileimage etc of the page The BLProM must distin-guish between the main and secondary requests Inother words when the main requests are identifiedthe remaining ones are considered as secondaryrequests In this way any request that its Refererheader is different from the Referer of the previ-ous request is considered as the secondary requestsand the previous one is the main request Addition-ally the first request in the traffic is consideredthe main request because the first request does notinclude the Referer It should be noted that theReferer header field in the HTTP request indicatesthe URL of the previous page the user visited

Identifying the corresponding responsesto the main requests To identify the corre-sponding responses to the main requests it is onlyneeded to select the responses that their content-type field is texthtml Because the correspond-ing response to the secondary requests is often afile photo etc while the corresponding responseto the main requests is in the form of text andHTML Such responses are the main responses inthe HTTP traffic After identifying the main re-quests and responses in the traffic each pair (mainrequest corresponding responses to the main re-quest) indicates a page of the web application It

July 2019 Volume 6 Number 2 (pp 65ndash80) 69

Figure 3 Number of Consecutive Requests

should be noted that in responses of the main re-quests all secondary requests exist that lead toload the web pageIdentifyingwhether the last request in the

traffic is the main request or not If the lastresponse in the traffic is the main response thelast HTTP request is the main request as well

For example in Figure 3 requests 1 2 6 and 8are the main requests that are shown in red

The pseudocode of the algorithm used by the BL-ProM to identify web application pages is shownin Algorithm 1 As mentioned before each webapplication page can be indicated by a pair (mainrequest corresponding responses to the main re-quest) The pseudocode in Algorithm 1 can bedivided into four main parts1 Extracting main requests in the traffic (line 13-

26)2 Extracting corresponding responses to the main

requests (line 27-33)3 Identifying whether the last request in the traf-

fic is the main request or not (line 34-37)4 Extracting web application pages (lines 38-43)

In line 9 first all requests in the traffic eitherthe main requests or secondary ones are extractedand put in the HTTPRequest variable In line 10existing responses in the traffic are added to theHTTPResponse variable In lines 11 and 12 thetotal number of requests and responses is calcu-lated In line 15 it is checked whether the currentrequest is the first request if yes it is considered asthe main request and included in the MainRequestvariable In line 21 it is checked whether the Ref-erer field of the current request is different fromthe Referer field of the previous request if yes theprevious request is considered as the main request

In line 29 it is checked whether the content-type field of the current response is texthtml Ifthe given condition is met the current request isconsidered as the main request

In line 35 it is checked whether the last HTTPresponse is the main response if yes the last re-quest is considered as the main request as well

In line 41 all main requests and responses arethe web application pages which are shown as pairs(request response)

3) Clustering the web application pagesIn this step BLProM clusters the web applicationpages Clustering aims to put similar pages in thesame cluster This is helpful to prevent the infinitegrowth of the user navigation graph The pages inthe same cluster are similar to each other

At this stage all pages in the userrsquos stored traffichave been extracted as pairs (main request cor-responding response to the main request) Eachpair (main request corresponding response to themain request) shows one page of the applicationTo extract the optimal user navigation graph thesimilar extracted pages must be identified and clus-tered In the user navigation graph nodes indicatethe applicationrsquos unique pages and edges repre-sent the link between the pages To identify similarpages a criterion should be considered by the pur-pose of the clustering The type of operations thatthe user can perform on the page is considered asa measure for the separation of pages In otherwords two pages are similar if the user can per-form the same operations on them For exampleconsider two pages such that both contain onlya button but the title of the buttons is differentwhere in the first page the title is ldquocontinuerdquo andfor the second one it is ldquosaverdquo These two pagesare different because the user performs differentoperations on them Thus according to the crite-ria specified for the similar pages the followingpages are considered similar in web applicationsbull In the online shop application if the pages

related to the shopping cart of goods even ifthey contain different items they are consid-ered similar pagesbull In the online shop application the profile

pages of each product are similar becausethey have the same HTML structure in termsof the important HTML elementsbull Pages that display search results with different

keywords are similarbull If the structure of pages is a subset of another

page in terms of the important HTML ele-ments these pages are considered similarbull Two pages that both contain user comments

are considered similar even if the contents areabout different products

70 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 1 The Pseudocode for Extracting Web Application Pages From the Traffic

INPUT HITRtrafficHttpMessage1 HttpMessage2 HttpMessages3 HithMessagenOUTPUT WebPages as a set of (Requesti Respensei)

1 Begin2 let JWebPages = set of web pages3 let HttpRequest= Iset of HTTP Requests4 Iet HittpResponse = set of HTTP Responses5 let MainRequest= set of main HTTP Requests6 let MainResponse = set of main HTTP Responses7 let i k= 1 counter for current HttpMessage8 let LastReferer = empty NewRefere = empty9 HttpRequest=ExtractReg(HITRtraffic)extract HTTP Requests from HTTP traffic

10 HttpRespense - ExtractResp(HITRtraffic)extract HTTP Response from HTTP traffic11 n = extractNumber(HttpRequest) extract total number of HTTP Requests12 m = extractNumber(HttpResponse) extract total number of HTTP Responses13 extract Main HTTP Request from HTTPRequests14 for i 1 n do15 if (i = 1) then16 add MainRequestlarr HttpRequesti First Request is a Main Request17 LastReferer larr Referer of HttpRequesti18 else19 NewReferelarr Referer of HttpRequest120 end if21 if (NewReferer 6= LastReferer) then22 add MainRequestlarr HttpRequestiminus123 LastReferer larr NewRefere24 end if25 end for26 extract Main HTTP Response from HTTPResponse27 for k 1 m do28 if (content-type of HTTPResponsek = texthtml) then29 addMainRespenselarr HttpRespensek30 end if31 end for32 checking last request is main request or not33 if (HTTPResponsemisinMainResponse) then34 addMainRequestlarr HTTPRequestm35 end if36 extract set of web pages37 size = extractNumber(MainRequest) extract total number of Main HTTP Requests38 for j 1 size do39 WebPageslarr (MainRequestj)MainResponsej)40 end for41 return WebRages42 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 71

Table 1 Attribute Vector of a Page in osComerce Web Application

inputs null

Buttons

htmlbodybuttonReviews

htmlbodydivdivdivdivformdivdivspans

panbuttonAdd to Cart

anchors null

image htmlbodydivdivdivaimg

Figure 4 An Example of HTML Code

Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code

For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton

Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors

The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages

In the following these steps are discussed indetail1 Extracting attributes vectors of the page

The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector

WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =

prodni DOM(inputi)

DOMbuttons=prodn

i DOM(buttonsi)DOManchors=

prodni DOM(anchori)

DOMimgs=prodn

i DOM(imgi)

bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button

in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in

the pagebull DOM(img) DOM path of the existing im-

age in the pageSuppose the web application page contains

several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags

2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages

According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-

responding elements in the vector of page2bull All vector elements of page 1 are a subset

of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset

of corresponding elements in the vector ofpage 1

1 httpswwwoscommercecom

72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Figure 5 A Page of osCommerce Application

Algorithm 2 The Pseudocode for Identifying the Similar Pages

INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same

1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then

10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end

bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-

mentSimilar pages are identified according to the

above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages

3 Clustering web application pagesAfter identifying the similar pages they are

put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster

4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI

July 2019 Volume 6 Number 2 (pp 65ndash80) 73

Algorithm 3 The Pseudocode for Clustering the Web Application Pages

INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages

1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi

6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages

10 Ck larr wj

11 end if12 end for13 k + +14 end for15 return C16 end

of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph

In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1

rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages

Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph

Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph

32 Identifying Business Processes in the Ap-plication

To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess

Definition 3 [The Application Process (P)] The

74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph

INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges

1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj

6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj

7 end for8 for i k do9 for j i+ 1 k do

10 if (URIci capReferercj 6= null) then11 E larr E + CiCj

12 end if13 end for14 end for15 return E16 end

Algorithm 5 The Pseudocode for Extracting the User Navigation Graph

INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt

1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck

7 end if8 end for9 return C0 CE

10 end

process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci

Algorithm 6 shows the pseudocode for extractingprocesses

Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there

The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-

fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button

Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions

1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)

2 If the process passes its first node again it means

July 2019 Volume 6 Number 2 (pp 65ndash80) 75

Algorithm 6 The Pseudocode for Extracting Processes

INPUT the web application first node C0

the web application Graph edges EOUTPUT the web application graph process P as a set of web application process

1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0

4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0

5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E

10 end if11 end

the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)

All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process

4 EXPERIMENTAL RESULTS

The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2

The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications

The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input

5 EVALUATION

BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer

We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same

To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria

bull True Positive Samples that fit well into theircorrect clusters

bull False Positive Samples that fit in a cluster thatdo not belong to that cluster

bull False Negative Samples that do not fit in a clusterbut they belong to that cluster

bull Recall It is calculated by the following formula

recall = TruePositiveTruePositive+FalseNegative

bull Precision It is calculated by the following for-mula

76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 2 Testbed Profiles

Web server (test target)

CPU Pentium dual core-220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (BLProM machine)

CPU Intel corei7 220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (legal user)

CPU Pentium dual core i3-3210 GHZ

OS windows 7

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Table 3 Selected Web Applications for Evaluation

Web application Description

TomatoCart-11861 e-commerce

osCommerce-234 e-commerce

WackoPicko Web application for Sharing picture

Algorithm 7 The Pseudocode for Identifying the Application Business Processes

INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process

1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk

9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj

15 end if16 end for17 return BP18 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 77

Table 4 The Clusters of Selected Web Application Pages Evaluation

Criteria

Web applicationWackoPicko Tomatocart osCommerce

samples 89 150 210

clusters 29 66 40

true positive 65 146 205

false positive 24 4 5

false negative 23 3 4

recall 074 098 098

precision 073 097 098

f-measure 073 098 098

Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko

WackoPicko

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 89 89 ndash

Graph Nodes 22 89 752

Graph Edges 48 270 822

process (P) 12 48 75

Average edge in each process (E) 4 46 913

Average edges in all processes (PE) 48 2208 978

business processes 10 NA ndash

precision = TruePositiveTruePositive+FalsePositive

bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision

TruePositive+FalsePositive

The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application

To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in

Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application

Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities

BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 4: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

68 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Figure 2 Black-Box Approach to Detect Web Application Business Process

level can be different according to the role of the userand consequently the user navigation graph varies de-pending on the permission level In this paper thenormal user has the user-level permission and searchesthrough related permissible pages The normal usercrawls all different parts of the application that areallowed to search

BLProM initially extracts the user navigation graphfrom the stored traffic this is performed through stepsas follows

1 Preprocessing of raw input data2 Identifying existing web application pages in the

stored traffic3 Clustering the web application pages4 Extracting the user navigation graph

1) Preprocessing of raw input dataIn the preprocessing step the BLProM cleans thedata and removes irrelevant data samples In thispaper only HTTP requests and responses are usedFor the responses only those with successful statuscodes 200 and 209 are employed The BLProMremoves responses with failure status codes aswell as their corresponding requests Additionallyin this paper only GET and POST requests areneeded and the remaining ones are discarded

2) Identifying the web application pages in thestored traffic

2) Identifying the web application pages in the storedtrafficEach page of the application can be representedas a pair (main request corresponding responseto the main request) To identify the web applica-

tion pages it is first necessary to detect the mainrequests in the traffic After identifying the mainrequests the corresponding responses must alsobe identified

Identifying the main HTTP requests inthe traffic In the userrsquos stored traffic there areboth the main HTTP requests that lead to load-ing the web application pages and the secondaryHTTP requests that are responsible to load a fileimage etc of the page The BLProM must distin-guish between the main and secondary requests Inother words when the main requests are identifiedthe remaining ones are considered as secondaryrequests In this way any request that its Refererheader is different from the Referer of the previ-ous request is considered as the secondary requestsand the previous one is the main request Addition-ally the first request in the traffic is consideredthe main request because the first request does notinclude the Referer It should be noted that theReferer header field in the HTTP request indicatesthe URL of the previous page the user visited

Identifying the corresponding responsesto the main requests To identify the corre-sponding responses to the main requests it is onlyneeded to select the responses that their content-type field is texthtml Because the correspond-ing response to the secondary requests is often afile photo etc while the corresponding responseto the main requests is in the form of text andHTML Such responses are the main responses inthe HTTP traffic After identifying the main re-quests and responses in the traffic each pair (mainrequest corresponding responses to the main re-quest) indicates a page of the web application It

July 2019 Volume 6 Number 2 (pp 65ndash80) 69

Figure 3 Number of Consecutive Requests

should be noted that in responses of the main re-quests all secondary requests exist that lead toload the web pageIdentifyingwhether the last request in the

traffic is the main request or not If the lastresponse in the traffic is the main response thelast HTTP request is the main request as well

For example in Figure 3 requests 1 2 6 and 8are the main requests that are shown in red

The pseudocode of the algorithm used by the BL-ProM to identify web application pages is shownin Algorithm 1 As mentioned before each webapplication page can be indicated by a pair (mainrequest corresponding responses to the main re-quest) The pseudocode in Algorithm 1 can bedivided into four main parts1 Extracting main requests in the traffic (line 13-

26)2 Extracting corresponding responses to the main

requests (line 27-33)3 Identifying whether the last request in the traf-

fic is the main request or not (line 34-37)4 Extracting web application pages (lines 38-43)

In line 9 first all requests in the traffic eitherthe main requests or secondary ones are extractedand put in the HTTPRequest variable In line 10existing responses in the traffic are added to theHTTPResponse variable In lines 11 and 12 thetotal number of requests and responses is calcu-lated In line 15 it is checked whether the currentrequest is the first request if yes it is considered asthe main request and included in the MainRequestvariable In line 21 it is checked whether the Ref-erer field of the current request is different fromthe Referer field of the previous request if yes theprevious request is considered as the main request

In line 29 it is checked whether the content-type field of the current response is texthtml Ifthe given condition is met the current request isconsidered as the main request

In line 35 it is checked whether the last HTTPresponse is the main response if yes the last re-quest is considered as the main request as well

In line 41 all main requests and responses arethe web application pages which are shown as pairs(request response)

3) Clustering the web application pagesIn this step BLProM clusters the web applicationpages Clustering aims to put similar pages in thesame cluster This is helpful to prevent the infinitegrowth of the user navigation graph The pages inthe same cluster are similar to each other

At this stage all pages in the userrsquos stored traffichave been extracted as pairs (main request cor-responding response to the main request) Eachpair (main request corresponding response to themain request) shows one page of the applicationTo extract the optimal user navigation graph thesimilar extracted pages must be identified and clus-tered In the user navigation graph nodes indicatethe applicationrsquos unique pages and edges repre-sent the link between the pages To identify similarpages a criterion should be considered by the pur-pose of the clustering The type of operations thatthe user can perform on the page is considered asa measure for the separation of pages In otherwords two pages are similar if the user can per-form the same operations on them For exampleconsider two pages such that both contain onlya button but the title of the buttons is differentwhere in the first page the title is ldquocontinuerdquo andfor the second one it is ldquosaverdquo These two pagesare different because the user performs differentoperations on them Thus according to the crite-ria specified for the similar pages the followingpages are considered similar in web applicationsbull In the online shop application if the pages

related to the shopping cart of goods even ifthey contain different items they are consid-ered similar pagesbull In the online shop application the profile

pages of each product are similar becausethey have the same HTML structure in termsof the important HTML elementsbull Pages that display search results with different

keywords are similarbull If the structure of pages is a subset of another

page in terms of the important HTML ele-ments these pages are considered similarbull Two pages that both contain user comments

are considered similar even if the contents areabout different products

70 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 1 The Pseudocode for Extracting Web Application Pages From the Traffic

INPUT HITRtrafficHttpMessage1 HttpMessage2 HttpMessages3 HithMessagenOUTPUT WebPages as a set of (Requesti Respensei)

1 Begin2 let JWebPages = set of web pages3 let HttpRequest= Iset of HTTP Requests4 Iet HittpResponse = set of HTTP Responses5 let MainRequest= set of main HTTP Requests6 let MainResponse = set of main HTTP Responses7 let i k= 1 counter for current HttpMessage8 let LastReferer = empty NewRefere = empty9 HttpRequest=ExtractReg(HITRtraffic)extract HTTP Requests from HTTP traffic

10 HttpRespense - ExtractResp(HITRtraffic)extract HTTP Response from HTTP traffic11 n = extractNumber(HttpRequest) extract total number of HTTP Requests12 m = extractNumber(HttpResponse) extract total number of HTTP Responses13 extract Main HTTP Request from HTTPRequests14 for i 1 n do15 if (i = 1) then16 add MainRequestlarr HttpRequesti First Request is a Main Request17 LastReferer larr Referer of HttpRequesti18 else19 NewReferelarr Referer of HttpRequest120 end if21 if (NewReferer 6= LastReferer) then22 add MainRequestlarr HttpRequestiminus123 LastReferer larr NewRefere24 end if25 end for26 extract Main HTTP Response from HTTPResponse27 for k 1 m do28 if (content-type of HTTPResponsek = texthtml) then29 addMainRespenselarr HttpRespensek30 end if31 end for32 checking last request is main request or not33 if (HTTPResponsemisinMainResponse) then34 addMainRequestlarr HTTPRequestm35 end if36 extract set of web pages37 size = extractNumber(MainRequest) extract total number of Main HTTP Requests38 for j 1 size do39 WebPageslarr (MainRequestj)MainResponsej)40 end for41 return WebRages42 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 71

Table 1 Attribute Vector of a Page in osComerce Web Application

inputs null

Buttons

htmlbodybuttonReviews

htmlbodydivdivdivdivformdivdivspans

panbuttonAdd to Cart

anchors null

image htmlbodydivdivdivaimg

Figure 4 An Example of HTML Code

Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code

For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton

Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors

The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages

In the following these steps are discussed indetail1 Extracting attributes vectors of the page

The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector

WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =

prodni DOM(inputi)

DOMbuttons=prodn

i DOM(buttonsi)DOManchors=

prodni DOM(anchori)

DOMimgs=prodn

i DOM(imgi)

bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button

in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in

the pagebull DOM(img) DOM path of the existing im-

age in the pageSuppose the web application page contains

several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags

2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages

According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-

responding elements in the vector of page2bull All vector elements of page 1 are a subset

of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset

of corresponding elements in the vector ofpage 1

1 httpswwwoscommercecom

72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Figure 5 A Page of osCommerce Application

Algorithm 2 The Pseudocode for Identifying the Similar Pages

INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same

1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then

10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end

bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-

mentSimilar pages are identified according to the

above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages

3 Clustering web application pagesAfter identifying the similar pages they are

put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster

4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI

July 2019 Volume 6 Number 2 (pp 65ndash80) 73

Algorithm 3 The Pseudocode for Clustering the Web Application Pages

INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages

1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi

6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages

10 Ck larr wj

11 end if12 end for13 k + +14 end for15 return C16 end

of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph

In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1

rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages

Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph

Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph

32 Identifying Business Processes in the Ap-plication

To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess

Definition 3 [The Application Process (P)] The

74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph

INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges

1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj

6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj

7 end for8 for i k do9 for j i+ 1 k do

10 if (URIci capReferercj 6= null) then11 E larr E + CiCj

12 end if13 end for14 end for15 return E16 end

Algorithm 5 The Pseudocode for Extracting the User Navigation Graph

INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt

1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck

7 end if8 end for9 return C0 CE

10 end

process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci

Algorithm 6 shows the pseudocode for extractingprocesses

Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there

The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-

fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button

Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions

1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)

2 If the process passes its first node again it means

July 2019 Volume 6 Number 2 (pp 65ndash80) 75

Algorithm 6 The Pseudocode for Extracting Processes

INPUT the web application first node C0

the web application Graph edges EOUTPUT the web application graph process P as a set of web application process

1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0

4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0

5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E

10 end if11 end

the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)

All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process

4 EXPERIMENTAL RESULTS

The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2

The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications

The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input

5 EVALUATION

BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer

We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same

To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria

bull True Positive Samples that fit well into theircorrect clusters

bull False Positive Samples that fit in a cluster thatdo not belong to that cluster

bull False Negative Samples that do not fit in a clusterbut they belong to that cluster

bull Recall It is calculated by the following formula

recall = TruePositiveTruePositive+FalseNegative

bull Precision It is calculated by the following for-mula

76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 2 Testbed Profiles

Web server (test target)

CPU Pentium dual core-220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (BLProM machine)

CPU Intel corei7 220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (legal user)

CPU Pentium dual core i3-3210 GHZ

OS windows 7

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Table 3 Selected Web Applications for Evaluation

Web application Description

TomatoCart-11861 e-commerce

osCommerce-234 e-commerce

WackoPicko Web application for Sharing picture

Algorithm 7 The Pseudocode for Identifying the Application Business Processes

INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process

1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk

9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj

15 end if16 end for17 return BP18 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 77

Table 4 The Clusters of Selected Web Application Pages Evaluation

Criteria

Web applicationWackoPicko Tomatocart osCommerce

samples 89 150 210

clusters 29 66 40

true positive 65 146 205

false positive 24 4 5

false negative 23 3 4

recall 074 098 098

precision 073 097 098

f-measure 073 098 098

Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko

WackoPicko

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 89 89 ndash

Graph Nodes 22 89 752

Graph Edges 48 270 822

process (P) 12 48 75

Average edge in each process (E) 4 46 913

Average edges in all processes (PE) 48 2208 978

business processes 10 NA ndash

precision = TruePositiveTruePositive+FalsePositive

bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision

TruePositive+FalsePositive

The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application

To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in

Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application

Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities

BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 5: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

July 2019 Volume 6 Number 2 (pp 65ndash80) 69

Figure 3 Number of Consecutive Requests

should be noted that in responses of the main re-quests all secondary requests exist that lead toload the web pageIdentifyingwhether the last request in the

traffic is the main request or not If the lastresponse in the traffic is the main response thelast HTTP request is the main request as well

For example in Figure 3 requests 1 2 6 and 8are the main requests that are shown in red

The pseudocode of the algorithm used by the BL-ProM to identify web application pages is shownin Algorithm 1 As mentioned before each webapplication page can be indicated by a pair (mainrequest corresponding responses to the main re-quest) The pseudocode in Algorithm 1 can bedivided into four main parts1 Extracting main requests in the traffic (line 13-

26)2 Extracting corresponding responses to the main

requests (line 27-33)3 Identifying whether the last request in the traf-

fic is the main request or not (line 34-37)4 Extracting web application pages (lines 38-43)

In line 9 first all requests in the traffic eitherthe main requests or secondary ones are extractedand put in the HTTPRequest variable In line 10existing responses in the traffic are added to theHTTPResponse variable In lines 11 and 12 thetotal number of requests and responses is calcu-lated In line 15 it is checked whether the currentrequest is the first request if yes it is considered asthe main request and included in the MainRequestvariable In line 21 it is checked whether the Ref-erer field of the current request is different fromthe Referer field of the previous request if yes theprevious request is considered as the main request

In line 29 it is checked whether the content-type field of the current response is texthtml Ifthe given condition is met the current request isconsidered as the main request

In line 35 it is checked whether the last HTTPresponse is the main response if yes the last re-quest is considered as the main request as well

In line 41 all main requests and responses arethe web application pages which are shown as pairs(request response)

3) Clustering the web application pagesIn this step BLProM clusters the web applicationpages Clustering aims to put similar pages in thesame cluster This is helpful to prevent the infinitegrowth of the user navigation graph The pages inthe same cluster are similar to each other

At this stage all pages in the userrsquos stored traffichave been extracted as pairs (main request cor-responding response to the main request) Eachpair (main request corresponding response to themain request) shows one page of the applicationTo extract the optimal user navigation graph thesimilar extracted pages must be identified and clus-tered In the user navigation graph nodes indicatethe applicationrsquos unique pages and edges repre-sent the link between the pages To identify similarpages a criterion should be considered by the pur-pose of the clustering The type of operations thatthe user can perform on the page is considered asa measure for the separation of pages In otherwords two pages are similar if the user can per-form the same operations on them For exampleconsider two pages such that both contain onlya button but the title of the buttons is differentwhere in the first page the title is ldquocontinuerdquo andfor the second one it is ldquosaverdquo These two pagesare different because the user performs differentoperations on them Thus according to the crite-ria specified for the similar pages the followingpages are considered similar in web applicationsbull In the online shop application if the pages

related to the shopping cart of goods even ifthey contain different items they are consid-ered similar pagesbull In the online shop application the profile

pages of each product are similar becausethey have the same HTML structure in termsof the important HTML elementsbull Pages that display search results with different

keywords are similarbull If the structure of pages is a subset of another

page in terms of the important HTML ele-ments these pages are considered similarbull Two pages that both contain user comments

are considered similar even if the contents areabout different products

70 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 1 The Pseudocode for Extracting Web Application Pages From the Traffic

INPUT HITRtrafficHttpMessage1 HttpMessage2 HttpMessages3 HithMessagenOUTPUT WebPages as a set of (Requesti Respensei)

1 Begin2 let JWebPages = set of web pages3 let HttpRequest= Iset of HTTP Requests4 Iet HittpResponse = set of HTTP Responses5 let MainRequest= set of main HTTP Requests6 let MainResponse = set of main HTTP Responses7 let i k= 1 counter for current HttpMessage8 let LastReferer = empty NewRefere = empty9 HttpRequest=ExtractReg(HITRtraffic)extract HTTP Requests from HTTP traffic

10 HttpRespense - ExtractResp(HITRtraffic)extract HTTP Response from HTTP traffic11 n = extractNumber(HttpRequest) extract total number of HTTP Requests12 m = extractNumber(HttpResponse) extract total number of HTTP Responses13 extract Main HTTP Request from HTTPRequests14 for i 1 n do15 if (i = 1) then16 add MainRequestlarr HttpRequesti First Request is a Main Request17 LastReferer larr Referer of HttpRequesti18 else19 NewReferelarr Referer of HttpRequest120 end if21 if (NewReferer 6= LastReferer) then22 add MainRequestlarr HttpRequestiminus123 LastReferer larr NewRefere24 end if25 end for26 extract Main HTTP Response from HTTPResponse27 for k 1 m do28 if (content-type of HTTPResponsek = texthtml) then29 addMainRespenselarr HttpRespensek30 end if31 end for32 checking last request is main request or not33 if (HTTPResponsemisinMainResponse) then34 addMainRequestlarr HTTPRequestm35 end if36 extract set of web pages37 size = extractNumber(MainRequest) extract total number of Main HTTP Requests38 for j 1 size do39 WebPageslarr (MainRequestj)MainResponsej)40 end for41 return WebRages42 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 71

Table 1 Attribute Vector of a Page in osComerce Web Application

inputs null

Buttons

htmlbodybuttonReviews

htmlbodydivdivdivdivformdivdivspans

panbuttonAdd to Cart

anchors null

image htmlbodydivdivdivaimg

Figure 4 An Example of HTML Code

Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code

For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton

Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors

The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages

In the following these steps are discussed indetail1 Extracting attributes vectors of the page

The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector

WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =

prodni DOM(inputi)

DOMbuttons=prodn

i DOM(buttonsi)DOManchors=

prodni DOM(anchori)

DOMimgs=prodn

i DOM(imgi)

bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button

in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in

the pagebull DOM(img) DOM path of the existing im-

age in the pageSuppose the web application page contains

several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags

2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages

According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-

responding elements in the vector of page2bull All vector elements of page 1 are a subset

of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset

of corresponding elements in the vector ofpage 1

1 httpswwwoscommercecom

72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Figure 5 A Page of osCommerce Application

Algorithm 2 The Pseudocode for Identifying the Similar Pages

INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same

1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then

10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end

bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-

mentSimilar pages are identified according to the

above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages

3 Clustering web application pagesAfter identifying the similar pages they are

put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster

4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI

July 2019 Volume 6 Number 2 (pp 65ndash80) 73

Algorithm 3 The Pseudocode for Clustering the Web Application Pages

INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages

1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi

6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages

10 Ck larr wj

11 end if12 end for13 k + +14 end for15 return C16 end

of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph

In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1

rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages

Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph

Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph

32 Identifying Business Processes in the Ap-plication

To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess

Definition 3 [The Application Process (P)] The

74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph

INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges

1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj

6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj

7 end for8 for i k do9 for j i+ 1 k do

10 if (URIci capReferercj 6= null) then11 E larr E + CiCj

12 end if13 end for14 end for15 return E16 end

Algorithm 5 The Pseudocode for Extracting the User Navigation Graph

INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt

1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck

7 end if8 end for9 return C0 CE

10 end

process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci

Algorithm 6 shows the pseudocode for extractingprocesses

Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there

The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-

fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button

Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions

1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)

2 If the process passes its first node again it means

July 2019 Volume 6 Number 2 (pp 65ndash80) 75

Algorithm 6 The Pseudocode for Extracting Processes

INPUT the web application first node C0

the web application Graph edges EOUTPUT the web application graph process P as a set of web application process

1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0

4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0

5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E

10 end if11 end

the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)

All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process

4 EXPERIMENTAL RESULTS

The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2

The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications

The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input

5 EVALUATION

BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer

We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same

To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria

bull True Positive Samples that fit well into theircorrect clusters

bull False Positive Samples that fit in a cluster thatdo not belong to that cluster

bull False Negative Samples that do not fit in a clusterbut they belong to that cluster

bull Recall It is calculated by the following formula

recall = TruePositiveTruePositive+FalseNegative

bull Precision It is calculated by the following for-mula

76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 2 Testbed Profiles

Web server (test target)

CPU Pentium dual core-220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (BLProM machine)

CPU Intel corei7 220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (legal user)

CPU Pentium dual core i3-3210 GHZ

OS windows 7

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Table 3 Selected Web Applications for Evaluation

Web application Description

TomatoCart-11861 e-commerce

osCommerce-234 e-commerce

WackoPicko Web application for Sharing picture

Algorithm 7 The Pseudocode for Identifying the Application Business Processes

INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process

1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk

9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj

15 end if16 end for17 return BP18 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 77

Table 4 The Clusters of Selected Web Application Pages Evaluation

Criteria

Web applicationWackoPicko Tomatocart osCommerce

samples 89 150 210

clusters 29 66 40

true positive 65 146 205

false positive 24 4 5

false negative 23 3 4

recall 074 098 098

precision 073 097 098

f-measure 073 098 098

Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko

WackoPicko

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 89 89 ndash

Graph Nodes 22 89 752

Graph Edges 48 270 822

process (P) 12 48 75

Average edge in each process (E) 4 46 913

Average edges in all processes (PE) 48 2208 978

business processes 10 NA ndash

precision = TruePositiveTruePositive+FalsePositive

bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision

TruePositive+FalsePositive

The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application

To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in

Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application

Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities

BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 6: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

70 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 1 The Pseudocode for Extracting Web Application Pages From the Traffic

INPUT HITRtrafficHttpMessage1 HttpMessage2 HttpMessages3 HithMessagenOUTPUT WebPages as a set of (Requesti Respensei)

1 Begin2 let JWebPages = set of web pages3 let HttpRequest= Iset of HTTP Requests4 Iet HittpResponse = set of HTTP Responses5 let MainRequest= set of main HTTP Requests6 let MainResponse = set of main HTTP Responses7 let i k= 1 counter for current HttpMessage8 let LastReferer = empty NewRefere = empty9 HttpRequest=ExtractReg(HITRtraffic)extract HTTP Requests from HTTP traffic

10 HttpRespense - ExtractResp(HITRtraffic)extract HTTP Response from HTTP traffic11 n = extractNumber(HttpRequest) extract total number of HTTP Requests12 m = extractNumber(HttpResponse) extract total number of HTTP Responses13 extract Main HTTP Request from HTTPRequests14 for i 1 n do15 if (i = 1) then16 add MainRequestlarr HttpRequesti First Request is a Main Request17 LastReferer larr Referer of HttpRequesti18 else19 NewReferelarr Referer of HttpRequest120 end if21 if (NewReferer 6= LastReferer) then22 add MainRequestlarr HttpRequestiminus123 LastReferer larr NewRefere24 end if25 end for26 extract Main HTTP Response from HTTPResponse27 for k 1 m do28 if (content-type of HTTPResponsek = texthtml) then29 addMainRespenselarr HttpRespensek30 end if31 end for32 checking last request is main request or not33 if (HTTPResponsemisinMainResponse) then34 addMainRequestlarr HTTPRequestm35 end if36 extract set of web pages37 size = extractNumber(MainRequest) extract total number of Main HTTP Requests38 for j 1 size do39 WebPageslarr (MainRequestj)MainResponsej)40 end for41 return WebRages42 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 71

Table 1 Attribute Vector of a Page in osComerce Web Application

inputs null

Buttons

htmlbodybuttonReviews

htmlbodydivdivdivdivformdivdivspans

panbuttonAdd to Cart

anchors null

image htmlbodydivdivdivaimg

Figure 4 An Example of HTML Code

Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code

For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton

Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors

The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages

In the following these steps are discussed indetail1 Extracting attributes vectors of the page

The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector

WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =

prodni DOM(inputi)

DOMbuttons=prodn

i DOM(buttonsi)DOManchors=

prodni DOM(anchori)

DOMimgs=prodn

i DOM(imgi)

bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button

in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in

the pagebull DOM(img) DOM path of the existing im-

age in the pageSuppose the web application page contains

several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags

2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages

According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-

responding elements in the vector of page2bull All vector elements of page 1 are a subset

of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset

of corresponding elements in the vector ofpage 1

1 httpswwwoscommercecom

72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Figure 5 A Page of osCommerce Application

Algorithm 2 The Pseudocode for Identifying the Similar Pages

INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same

1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then

10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end

bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-

mentSimilar pages are identified according to the

above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages

3 Clustering web application pagesAfter identifying the similar pages they are

put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster

4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI

July 2019 Volume 6 Number 2 (pp 65ndash80) 73

Algorithm 3 The Pseudocode for Clustering the Web Application Pages

INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages

1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi

6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages

10 Ck larr wj

11 end if12 end for13 k + +14 end for15 return C16 end

of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph

In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1

rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages

Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph

Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph

32 Identifying Business Processes in the Ap-plication

To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess

Definition 3 [The Application Process (P)] The

74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph

INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges

1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj

6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj

7 end for8 for i k do9 for j i+ 1 k do

10 if (URIci capReferercj 6= null) then11 E larr E + CiCj

12 end if13 end for14 end for15 return E16 end

Algorithm 5 The Pseudocode for Extracting the User Navigation Graph

INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt

1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck

7 end if8 end for9 return C0 CE

10 end

process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci

Algorithm 6 shows the pseudocode for extractingprocesses

Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there

The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-

fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button

Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions

1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)

2 If the process passes its first node again it means

July 2019 Volume 6 Number 2 (pp 65ndash80) 75

Algorithm 6 The Pseudocode for Extracting Processes

INPUT the web application first node C0

the web application Graph edges EOUTPUT the web application graph process P as a set of web application process

1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0

4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0

5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E

10 end if11 end

the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)

All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process

4 EXPERIMENTAL RESULTS

The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2

The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications

The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input

5 EVALUATION

BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer

We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same

To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria

bull True Positive Samples that fit well into theircorrect clusters

bull False Positive Samples that fit in a cluster thatdo not belong to that cluster

bull False Negative Samples that do not fit in a clusterbut they belong to that cluster

bull Recall It is calculated by the following formula

recall = TruePositiveTruePositive+FalseNegative

bull Precision It is calculated by the following for-mula

76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 2 Testbed Profiles

Web server (test target)

CPU Pentium dual core-220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (BLProM machine)

CPU Intel corei7 220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (legal user)

CPU Pentium dual core i3-3210 GHZ

OS windows 7

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Table 3 Selected Web Applications for Evaluation

Web application Description

TomatoCart-11861 e-commerce

osCommerce-234 e-commerce

WackoPicko Web application for Sharing picture

Algorithm 7 The Pseudocode for Identifying the Application Business Processes

INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process

1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk

9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj

15 end if16 end for17 return BP18 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 77

Table 4 The Clusters of Selected Web Application Pages Evaluation

Criteria

Web applicationWackoPicko Tomatocart osCommerce

samples 89 150 210

clusters 29 66 40

true positive 65 146 205

false positive 24 4 5

false negative 23 3 4

recall 074 098 098

precision 073 097 098

f-measure 073 098 098

Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko

WackoPicko

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 89 89 ndash

Graph Nodes 22 89 752

Graph Edges 48 270 822

process (P) 12 48 75

Average edge in each process (E) 4 46 913

Average edges in all processes (PE) 48 2208 978

business processes 10 NA ndash

precision = TruePositiveTruePositive+FalsePositive

bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision

TruePositive+FalsePositive

The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application

To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in

Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application

Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities

BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 7: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

July 2019 Volume 6 Number 2 (pp 65ndash80) 71

Table 1 Attribute Vector of a Page in osComerce Web Application

inputs null

Buttons

htmlbodybuttonReviews

htmlbodydivdivdivdivformdivdivspans

panbuttonAdd to Cart

anchors null

image htmlbodydivdivdivaimg

Figure 4 An Example of HTML Code

Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code

For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton

Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors

The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages

In the following these steps are discussed indetail1 Extracting attributes vectors of the page

The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector

WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =

prodni DOM(inputi)

DOMbuttons=prodn

i DOM(buttonsi)DOManchors=

prodni DOM(anchori)

DOMimgs=prodn

i DOM(imgi)

bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button

in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in

the pagebull DOM(img) DOM path of the existing im-

age in the pageSuppose the web application page contains

several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags

2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages

According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-

responding elements in the vector of page2bull All vector elements of page 1 are a subset

of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset

of corresponding elements in the vector ofpage 1

1 httpswwwoscommercecom

72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Figure 5 A Page of osCommerce Application

Algorithm 2 The Pseudocode for Identifying the Similar Pages

INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same

1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then

10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end

bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-

mentSimilar pages are identified according to the

above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages

3 Clustering web application pagesAfter identifying the similar pages they are

put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster

4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI

July 2019 Volume 6 Number 2 (pp 65ndash80) 73

Algorithm 3 The Pseudocode for Clustering the Web Application Pages

INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages

1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi

6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages

10 Ck larr wj

11 end if12 end for13 k + +14 end for15 return C16 end

of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph

In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1

rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages

Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph

Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph

32 Identifying Business Processes in the Ap-plication

To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess

Definition 3 [The Application Process (P)] The

74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph

INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges

1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj

6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj

7 end for8 for i k do9 for j i+ 1 k do

10 if (URIci capReferercj 6= null) then11 E larr E + CiCj

12 end if13 end for14 end for15 return E16 end

Algorithm 5 The Pseudocode for Extracting the User Navigation Graph

INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt

1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck

7 end if8 end for9 return C0 CE

10 end

process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci

Algorithm 6 shows the pseudocode for extractingprocesses

Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there

The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-

fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button

Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions

1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)

2 If the process passes its first node again it means

July 2019 Volume 6 Number 2 (pp 65ndash80) 75

Algorithm 6 The Pseudocode for Extracting Processes

INPUT the web application first node C0

the web application Graph edges EOUTPUT the web application graph process P as a set of web application process

1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0

4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0

5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E

10 end if11 end

the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)

All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process

4 EXPERIMENTAL RESULTS

The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2

The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications

The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input

5 EVALUATION

BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer

We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same

To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria

bull True Positive Samples that fit well into theircorrect clusters

bull False Positive Samples that fit in a cluster thatdo not belong to that cluster

bull False Negative Samples that do not fit in a clusterbut they belong to that cluster

bull Recall It is calculated by the following formula

recall = TruePositiveTruePositive+FalseNegative

bull Precision It is calculated by the following for-mula

76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 2 Testbed Profiles

Web server (test target)

CPU Pentium dual core-220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (BLProM machine)

CPU Intel corei7 220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (legal user)

CPU Pentium dual core i3-3210 GHZ

OS windows 7

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Table 3 Selected Web Applications for Evaluation

Web application Description

TomatoCart-11861 e-commerce

osCommerce-234 e-commerce

WackoPicko Web application for Sharing picture

Algorithm 7 The Pseudocode for Identifying the Application Business Processes

INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process

1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk

9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj

15 end if16 end for17 return BP18 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 77

Table 4 The Clusters of Selected Web Application Pages Evaluation

Criteria

Web applicationWackoPicko Tomatocart osCommerce

samples 89 150 210

clusters 29 66 40

true positive 65 146 205

false positive 24 4 5

false negative 23 3 4

recall 074 098 098

precision 073 097 098

f-measure 073 098 098

Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko

WackoPicko

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 89 89 ndash

Graph Nodes 22 89 752

Graph Edges 48 270 822

process (P) 12 48 75

Average edge in each process (E) 4 46 913

Average edges in all processes (PE) 48 2208 978

business processes 10 NA ndash

precision = TruePositiveTruePositive+FalsePositive

bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision

TruePositive+FalsePositive

The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application

To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in

Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application

Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities

BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 8: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Figure 5 A Page of osCommerce Application

Algorithm 2 The Pseudocode for Identifying the Similar Pages

INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same

1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then

10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end

bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-

mentSimilar pages are identified according to the

above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages

3 Clustering web application pagesAfter identifying the similar pages they are

put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster

4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI

July 2019 Volume 6 Number 2 (pp 65ndash80) 73

Algorithm 3 The Pseudocode for Clustering the Web Application Pages

INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages

1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi

6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages

10 Ck larr wj

11 end if12 end for13 k + +14 end for15 return C16 end

of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph

In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1

rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages

Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph

Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph

32 Identifying Business Processes in the Ap-plication

To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess

Definition 3 [The Application Process (P)] The

74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph

INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges

1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj

6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj

7 end for8 for i k do9 for j i+ 1 k do

10 if (URIci capReferercj 6= null) then11 E larr E + CiCj

12 end if13 end for14 end for15 return E16 end

Algorithm 5 The Pseudocode for Extracting the User Navigation Graph

INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt

1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck

7 end if8 end for9 return C0 CE

10 end

process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci

Algorithm 6 shows the pseudocode for extractingprocesses

Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there

The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-

fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button

Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions

1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)

2 If the process passes its first node again it means

July 2019 Volume 6 Number 2 (pp 65ndash80) 75

Algorithm 6 The Pseudocode for Extracting Processes

INPUT the web application first node C0

the web application Graph edges EOUTPUT the web application graph process P as a set of web application process

1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0

4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0

5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E

10 end if11 end

the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)

All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process

4 EXPERIMENTAL RESULTS

The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2

The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications

The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input

5 EVALUATION

BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer

We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same

To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria

bull True Positive Samples that fit well into theircorrect clusters

bull False Positive Samples that fit in a cluster thatdo not belong to that cluster

bull False Negative Samples that do not fit in a clusterbut they belong to that cluster

bull Recall It is calculated by the following formula

recall = TruePositiveTruePositive+FalseNegative

bull Precision It is calculated by the following for-mula

76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 2 Testbed Profiles

Web server (test target)

CPU Pentium dual core-220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (BLProM machine)

CPU Intel corei7 220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (legal user)

CPU Pentium dual core i3-3210 GHZ

OS windows 7

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Table 3 Selected Web Applications for Evaluation

Web application Description

TomatoCart-11861 e-commerce

osCommerce-234 e-commerce

WackoPicko Web application for Sharing picture

Algorithm 7 The Pseudocode for Identifying the Application Business Processes

INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process

1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk

9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj

15 end if16 end for17 return BP18 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 77

Table 4 The Clusters of Selected Web Application Pages Evaluation

Criteria

Web applicationWackoPicko Tomatocart osCommerce

samples 89 150 210

clusters 29 66 40

true positive 65 146 205

false positive 24 4 5

false negative 23 3 4

recall 074 098 098

precision 073 097 098

f-measure 073 098 098

Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko

WackoPicko

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 89 89 ndash

Graph Nodes 22 89 752

Graph Edges 48 270 822

process (P) 12 48 75

Average edge in each process (E) 4 46 913

Average edges in all processes (PE) 48 2208 978

business processes 10 NA ndash

precision = TruePositiveTruePositive+FalsePositive

bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision

TruePositive+FalsePositive

The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application

To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in

Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application

Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities

BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 9: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

July 2019 Volume 6 Number 2 (pp 65ndash80) 73

Algorithm 3 The Pseudocode for Clustering the Web Application Pages

INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages

1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi

6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages

10 Ck larr wj

11 end if12 end for13 k + +14 end for15 return C16 end

of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph

In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1

rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages

Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph

Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph

32 Identifying Business Processes in the Ap-plication

To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess

Definition 3 [The Application Process (P)] The

74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph

INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges

1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj

6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj

7 end for8 for i k do9 for j i+ 1 k do

10 if (URIci capReferercj 6= null) then11 E larr E + CiCj

12 end if13 end for14 end for15 return E16 end

Algorithm 5 The Pseudocode for Extracting the User Navigation Graph

INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt

1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck

7 end if8 end for9 return C0 CE

10 end

process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci

Algorithm 6 shows the pseudocode for extractingprocesses

Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there

The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-

fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button

Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions

1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)

2 If the process passes its first node again it means

July 2019 Volume 6 Number 2 (pp 65ndash80) 75

Algorithm 6 The Pseudocode for Extracting Processes

INPUT the web application first node C0

the web application Graph edges EOUTPUT the web application graph process P as a set of web application process

1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0

4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0

5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E

10 end if11 end

the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)

All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process

4 EXPERIMENTAL RESULTS

The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2

The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications

The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input

5 EVALUATION

BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer

We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same

To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria

bull True Positive Samples that fit well into theircorrect clusters

bull False Positive Samples that fit in a cluster thatdo not belong to that cluster

bull False Negative Samples that do not fit in a clusterbut they belong to that cluster

bull Recall It is calculated by the following formula

recall = TruePositiveTruePositive+FalseNegative

bull Precision It is calculated by the following for-mula

76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 2 Testbed Profiles

Web server (test target)

CPU Pentium dual core-220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (BLProM machine)

CPU Intel corei7 220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (legal user)

CPU Pentium dual core i3-3210 GHZ

OS windows 7

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Table 3 Selected Web Applications for Evaluation

Web application Description

TomatoCart-11861 e-commerce

osCommerce-234 e-commerce

WackoPicko Web application for Sharing picture

Algorithm 7 The Pseudocode for Identifying the Application Business Processes

INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process

1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk

9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj

15 end if16 end for17 return BP18 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 77

Table 4 The Clusters of Selected Web Application Pages Evaluation

Criteria

Web applicationWackoPicko Tomatocart osCommerce

samples 89 150 210

clusters 29 66 40

true positive 65 146 205

false positive 24 4 5

false negative 23 3 4

recall 074 098 098

precision 073 097 098

f-measure 073 098 098

Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko

WackoPicko

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 89 89 ndash

Graph Nodes 22 89 752

Graph Edges 48 270 822

process (P) 12 48 75

Average edge in each process (E) 4 46 913

Average edges in all processes (PE) 48 2208 978

business processes 10 NA ndash

precision = TruePositiveTruePositive+FalsePositive

bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision

TruePositive+FalsePositive

The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application

To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in

Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application

Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities

BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 10: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph

INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges

1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj

6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj

7 end for8 for i k do9 for j i+ 1 k do

10 if (URIci capReferercj 6= null) then11 E larr E + CiCj

12 end if13 end for14 end for15 return E16 end

Algorithm 5 The Pseudocode for Extracting the User Navigation Graph

INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt

1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck

7 end if8 end for9 return C0 CE

10 end

process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci

Algorithm 6 shows the pseudocode for extractingprocesses

Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there

The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-

fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button

Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions

1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)

2 If the process passes its first node again it means

July 2019 Volume 6 Number 2 (pp 65ndash80) 75

Algorithm 6 The Pseudocode for Extracting Processes

INPUT the web application first node C0

the web application Graph edges EOUTPUT the web application graph process P as a set of web application process

1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0

4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0

5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E

10 end if11 end

the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)

All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process

4 EXPERIMENTAL RESULTS

The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2

The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications

The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input

5 EVALUATION

BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer

We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same

To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria

bull True Positive Samples that fit well into theircorrect clusters

bull False Positive Samples that fit in a cluster thatdo not belong to that cluster

bull False Negative Samples that do not fit in a clusterbut they belong to that cluster

bull Recall It is calculated by the following formula

recall = TruePositiveTruePositive+FalseNegative

bull Precision It is calculated by the following for-mula

76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 2 Testbed Profiles

Web server (test target)

CPU Pentium dual core-220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (BLProM machine)

CPU Intel corei7 220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (legal user)

CPU Pentium dual core i3-3210 GHZ

OS windows 7

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Table 3 Selected Web Applications for Evaluation

Web application Description

TomatoCart-11861 e-commerce

osCommerce-234 e-commerce

WackoPicko Web application for Sharing picture

Algorithm 7 The Pseudocode for Identifying the Application Business Processes

INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process

1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk

9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj

15 end if16 end for17 return BP18 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 77

Table 4 The Clusters of Selected Web Application Pages Evaluation

Criteria

Web applicationWackoPicko Tomatocart osCommerce

samples 89 150 210

clusters 29 66 40

true positive 65 146 205

false positive 24 4 5

false negative 23 3 4

recall 074 098 098

precision 073 097 098

f-measure 073 098 098

Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko

WackoPicko

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 89 89 ndash

Graph Nodes 22 89 752

Graph Edges 48 270 822

process (P) 12 48 75

Average edge in each process (E) 4 46 913

Average edges in all processes (PE) 48 2208 978

business processes 10 NA ndash

precision = TruePositiveTruePositive+FalsePositive

bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision

TruePositive+FalsePositive

The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application

To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in

Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application

Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities

BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 11: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

July 2019 Volume 6 Number 2 (pp 65ndash80) 75

Algorithm 6 The Pseudocode for Extracting Processes

INPUT the web application first node C0

the web application Graph edges EOUTPUT the web application graph process P as a set of web application process

1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0

4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0

5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E

10 end if11 end

the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)

All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process

4 EXPERIMENTAL RESULTS

The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2

The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications

The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input

5 EVALUATION

BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer

We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same

To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria

bull True Positive Samples that fit well into theircorrect clusters

bull False Positive Samples that fit in a cluster thatdo not belong to that cluster

bull False Negative Samples that do not fit in a clusterbut they belong to that cluster

bull Recall It is calculated by the following formula

recall = TruePositiveTruePositive+FalseNegative

bull Precision It is calculated by the following for-mula

76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 2 Testbed Profiles

Web server (test target)

CPU Pentium dual core-220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (BLProM machine)

CPU Intel corei7 220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (legal user)

CPU Pentium dual core i3-3210 GHZ

OS windows 7

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Table 3 Selected Web Applications for Evaluation

Web application Description

TomatoCart-11861 e-commerce

osCommerce-234 e-commerce

WackoPicko Web application for Sharing picture

Algorithm 7 The Pseudocode for Identifying the Application Business Processes

INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process

1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk

9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj

15 end if16 end for17 return BP18 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 77

Table 4 The Clusters of Selected Web Application Pages Evaluation

Criteria

Web applicationWackoPicko Tomatocart osCommerce

samples 89 150 210

clusters 29 66 40

true positive 65 146 205

false positive 24 4 5

false negative 23 3 4

recall 074 098 098

precision 073 097 098

f-measure 073 098 098

Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko

WackoPicko

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 89 89 ndash

Graph Nodes 22 89 752

Graph Edges 48 270 822

process (P) 12 48 75

Average edge in each process (E) 4 46 913

Average edges in all processes (PE) 48 2208 978

business processes 10 NA ndash

precision = TruePositiveTruePositive+FalsePositive

bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision

TruePositive+FalsePositive

The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application

To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in

Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application

Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities

BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 12: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 2 Testbed Profiles

Web server (test target)

CPU Pentium dual core-220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (BLProM machine)

CPU Intel corei7 220 GHZ

OS windows 81

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Client (legal user)

CPU Pentium dual core i3-3210 GHZ

OS windows 7

VMware cpu 1GHZ

VMware RAM 1G

VMware OS windows 7

Table 3 Selected Web Applications for Evaluation

Web application Description

TomatoCart-11861 e-commerce

osCommerce-234 e-commerce

WackoPicko Web application for Sharing picture

Algorithm 7 The Pseudocode for Identifying the Application Business Processes

INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process

1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk

9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj

15 end if16 end for17 return BP18 end

July 2019 Volume 6 Number 2 (pp 65ndash80) 77

Table 4 The Clusters of Selected Web Application Pages Evaluation

Criteria

Web applicationWackoPicko Tomatocart osCommerce

samples 89 150 210

clusters 29 66 40

true positive 65 146 205

false positive 24 4 5

false negative 23 3 4

recall 074 098 098

precision 073 097 098

f-measure 073 098 098

Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko

WackoPicko

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 89 89 ndash

Graph Nodes 22 89 752

Graph Edges 48 270 822

process (P) 12 48 75

Average edge in each process (E) 4 46 913

Average edges in all processes (PE) 48 2208 978

business processes 10 NA ndash

precision = TruePositiveTruePositive+FalsePositive

bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision

TruePositive+FalsePositive

The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application

To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in

Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application

Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities

BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 13: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

July 2019 Volume 6 Number 2 (pp 65ndash80) 77

Table 4 The Clusters of Selected Web Application Pages Evaluation

Criteria

Web applicationWackoPicko Tomatocart osCommerce

samples 89 150 210

clusters 29 66 40

true positive 65 146 205

false positive 24 4 5

false negative 23 3 4

recall 074 098 098

precision 073 097 098

f-measure 073 098 098

Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko

WackoPicko

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 89 89 ndash

Graph Nodes 22 89 752

Graph Edges 48 270 822

process (P) 12 48 75

Average edge in each process (E) 4 46 913

Average edges in all processes (PE) 48 2208 978

business processes 10 NA ndash

precision = TruePositiveTruePositive+FalsePositive

bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision

TruePositive+FalsePositive

The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application

To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in

Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application

Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities

BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 14: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce

osCommerce

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 170 170 ndash

Graph Nodes 40 170 764

Graph Edges 66 379 825

process (P) 23 17 26

Average edge in each process (E) 3 113 973

Average edges in all processes (PE) 69 1921 964

business processes 18 NA ndash

Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart

TomatoCart

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

HTTP Message 150 150 ndash

Graph Nodes 66 150 56

Graph Edges 87 410 787

process (P) 31 39 205

Average edge in each process (E) 4 101 96

Average edges in all processes (PE) 156 3131 95

business processes 30 NA ndash

BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning

Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP

According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities

6 CONCLUSION

Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web

scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify

In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps

1- extracting user navigation graph2- Detecting web application business processes

At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 15: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

July 2019 Volume 6 Number 2 (pp 65ndash80) 79

Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected

Web Applications

Average of selected web application

Criteria

ApproachesProposed approach OWASP ZAP Percentage of improvement

compared to OWASP ZAP

Graph Nodes 426 1363 692

Graph Edges 67 353 811

process (P) 20 366 405

Average edge in each process (E) 36 866 948

Average edges in all processes (PE) 91 2420 964

application pages

References

[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml

[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg

ITRC-Surveys-Studies2015databreaches

html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-

Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021

[4] Testing for business logic OWASP https

wwwowasporgindexphpTesting_for_

business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-

gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250

[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736

[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010

[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp

orgimagesaaaOWASP_Cincinnati_Jan_

2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box

approach for detection of state violation at-

tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767

[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4

[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605

[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017

[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164

[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018

[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899

[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION
Page 16: BLProM:ABlack-BoxApproachforDetectingBusiness-Layer ...jcomsec.ui.ac.ir/article_24589_64a7b183942b390c9d48f11e67352c9… · Comparing the results of the web application scanning,

80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al

Mitra Alidoosti received her BS and MSc

degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-

versity of Science and Technology Tehran

Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in

computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-

search interests are computer network security VoIP and SIP

security and web-application security

Alireza Nowroozi is a freelance consultantwho advises government and private-sector-

related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position

with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and

IT security Besides he is a co-founder of four IT startups

Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the

MSc and PhD degrees in artificial intelli-

gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor

in the Department of Computer EngineeringAmirkabir University of Technology His re-

search interests include statistical machine learning and softcomputing

  • 1 Introduction
  • 2 BACKGROUND and RELATED WORK
    • 21 Business Logic Attacks
    • 22 Clustering Web Pages
      • 3 BUSINESS-LAYER PROCESS MINER
        • 31 Extracting the User Navigation Graph
        • 32 Identifying Business Processes in the Application
          • 4 EXPERIMENTAL RESULTS
          • 5 EVALUATION
          • 6 CONCLUSION

Recommended