Journal of Computing and Security
July 2019 Volume 6 Number 2 (pp 65ndash80)
httpwwwjcomsecorg
BLProMABlack-BoxApproach for Detecting Business-Layer
Processes in theWebApplications
Mitra Alidoosti a Alireza Nowroozi blowast Ahmad Nickabadi c
aMalek-Ashtar University of technology Tehran IranbMalek-Ashtar University of technology Tehran IrancAmirkabir University of Tehran Tehran Iran
A R T I C L E I N F O
Article history
Received 24 May 2019
Revised 22 December 2019
Accepted 9 February 2020
Published Online 3 May 2020
KeywordsBusiness Layer Business Process
Navigation Graph
A B S T R A C T
Web application vulnerability scanners cannot detect business logic
vulnerabilities (vulnerabilities related to logic) because they are not able to
understand the business logic of the web application To identify the business
logic of the web application this paper presents BLProM Business-Layer
Process Miner the black-box approach that identifies business processes of
the web application Detecting business processes of the web applications can
be used in dynamic security testing to identify business logic vulnerabilities
in web applications BLProM first extracts the navigation graph of the web
application then identifies business processes from the navigation graph The
evaluation conducted on three well-known open-source web applications shows
that BLProM can detect business logic processes Experimental results show
that BLProM improves web application scanning because it clusters web
application pages and prevents scanning similar pages The proposed approach
is compared to OWASP ZAP an open-source web scanner We show that
BLProM improves web application scanning about 96
ccopy 2019 JComSec All rights reserved
1 Introduction
Most of the vulnerabilities reported in the CommonVulnerabilities and Exposures database [1] are relatedto the web application vulnerabilities The number ofsecurity breaches increased by 355 in 2015 comparedto last year [2] Business logic vulnerabilities affect webapplication security as the most potent vulnerabilities
So far there are subtle vulnerabilities related tothe web application logic that are still discovered
lowast Corresponding author
Email addresses Alidoostimutacir (M Alidoosti)
Nowroozimutacir (A Nowroozi) Nickabadiautacir
(A Nickabadi)
httpsdxdoiorg1022108jcs20201172231028
ISSN 2322-4460 ccopy 2019 JComSec All rights reserved
manually Automated scanners cannot detect businesslogic flaws in applications because scanners are notable to understand the context [3] Such vulnerabilitiescan only be detected through manual testing and berelied on tester creativity and skills [4]
There is no formal definition for business logic vul-nerabilities [3] It is very difficult to detect businesslogic vulnerabilities and this type of vulnerabilitycauses serious damage in case of misuse [3] Under-standing context is difficult for automated tools sopenetration testers are responsible for detecting busi-ness logic vulnerabilities Since business logic vulner-abilities are application-specific itrsquos hard to detectthese types of vulnerabilities
Web applications do not have any formal documen-tation describing their internal states and expected
66 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
user behavior The lack of such a document makes itdifficult to detect business logic vulnerabilities Forexample adding a specific item several times in ashopping cart is a common feature but repeated usageof discount code is a kind of business logic vulnerabil-ities A human easily understands the difference be-tween these two scenarios while a scanner without anappropriate model of the web application cannot dis-tinguish between these two scenarios [4] Researches[5ndash7] have been conducted to automatically detectbusiness logic vulnerabilities but they are used forsmall applications Also the application source codeis required to generate the appropriate model of theapplication [4] So nowadays an automatic tool thatcan finds business logic vulnerabilities is required
In this paper we propose BLProM a black-boxtechnique for detecting web application business layerBLProM By identifying business processes in the webapplication provides the ability to identify businesslayer vulnerabilities In other words to dynamic se-curity testing of the web application in the businesslayer it is necessary first to identify the business pro-cesses of the web application (detecting business logicof the web applications) Then by analyzing the pro-cesses the business layer vulnerabilities are identifiedThe BLProM output is the Web application businessprocesses that are used as input in dynamic securitytesting in the business layer The proposed approachis independent of the technology used in web applica-tions and automatically finds business processes Alsowe will show that BLProM improves web applicationscanning because it detects similar pages in the webapplication and prevents scanning of similar pagesComparing the results of the web application scanningbetween BLProM and OWASP ZAP (web applicationopen source scanner) shows that BLProM improvesweb application scanning by about 96 In summarythis paper makes the following contributions
bull We present BLProM a black-box technique fordetecting web application business-layer
bull We present a new black-box approach for cluster-ing web pages
bull We show that web application scanning improvedabout 96 by identifying web application busi-ness processes
2 BACKGROUND and RELATEDWORK
The business layer determines the business logic of theweb applications The business layer is responsible fordata processing and data management and specifiesbusiness logic policies and rules Besides this layer val-idates the input data Figure 1 shows the three-layer
architecture of a web application and the position ofthe business layer in the web application The presen-tation layer is a user interface that displays data tothe user and receives inputs from the user In a webapplication this is the part that receives the HTTPrequest and returns the HTML response
The business layer handles data validation and busi-ness rules The data access layer communicates withthe database by constructing SQL queries
After receiving data from the user the data is avail-able to the business layer The web application usesthe data to run business processes Every business pro-cess has several steps that should be implemented re-spectively and processes may interact with each other
The business layer specifies the logic of the webapplication Business logic vulnerability is a defect inthe business layer A business logic attack vector isa legitimate request (usually multiple requests) andhas legitimate input values that abuse a modulersquosfunctionality to inflict damage and direct damage tothe business
21 Business Logic Attacks
There are two approaches to prevent business logicattacks 1) identifying attacks at runtime (defenseapproach) and 2) identifying logic vulnerabilities inthe web applications (prevention approach) In thedefense approach the behavior of the web applicationis monitored and an attack is reported whenever theweb application exits from the normal state In theprevention approach attack vectors are used to identifybusiness logic vulnerabilities
BLOCK [9] and Swaddler [10] use a defense ap-proach to prevent business logic attacks BLOCK Firstobtains the behavioral model of the web applicationby observing the interaction of the clients and theweb application It extracts a set of constants fromthe requestresponse sequence and session variablesBLOCK Identifies any request or response that vio-lates identified constants as an attack
Swaddler provides an anomaly detection methodfor detecting attacks It uses anomaly in the internalstate of the web application to detect vulnerabilitiesIn other words the web applicationrsquos internal state ismonitored in the learning phase and the normal valuesof the web application state are extracted which definethe web application profile Then in the detectionphase abnormal states are identified
MiMoSA [5] uses a prevention approach in the formof a white-box approach to identify business logic vul-nerabilities MiMoSA first provides a web applicationmodel based on the web applicationrsquos state and work-
July 2019 Volume 6 Number 2 (pp 65ndash80) 67
Figure 1 The Three-Layer Architecture of a Web Application [8]
flow MiMoSA detects multi-step attacks by analyz-ing the relationship between the web application andthe database as well as the connections in the webapplication
SENTINEL [11] and Pellegrino [3] use a preventionapproach to identify business logic vulnerabilities inthe form of a black-box approach
SENTINEL [11] is a black-box approach for detect-ing logical weaknesses in database access SENTINELgenerates a state machine of the web and extracts a setof invariants from observed SQL queries and responsesand session variables as the application specificationAny SQL query that violates defined invariants isidentified as an attack
Pellegrino et al [3] propose a black-box techniqueto detect logic vulnerabilities in web applications Thistechnique extracts behavioral patterns from networktraces in which the user interacts with a certain ap-plicationrsquos functionality First the web application ismodeled and then attack vectors are applied to themodel
Our previous works BLDAST [12 13] andBLTOCTTOU [14] use a prevention approach toidentify business logic vulnerabilities in the form of ablack-box approach BLProM [15] can be used as aninput for BLTOCTTOU and BLDAST
BLDAST [12 13] is a dynamic and a black-box vul-nerability analysis approach that identifies businesslogic vulnerabilities of a web application against flood-ing DoS attacks BLDAST assesses web applicationresiliency against flooding DoS attacks It can takeinto account the business processes of a web applica-tion BLDAST selects critical pages in business pro-cesses A critical page has considerable response timeTherefore a critical process can enforce heavy loadinto the target and lead the web server to become un-responsive The goal of the BLDAST is to find thesecritical processes within the web applications
BLTOCTTOU [14] is a black-box dynamic applica-tion security tester for detecting business logic vulnera-bilities against race condition attacks BLTOCTTOUidentifies vulnerabilities with the help of finding thebusiness processes of the web application BLTOCT-TOU detects business processes that interact witheach other one process should set the value of a vari-
able and the other should read or write that variableTo identify the race condition BLTOCTTOU firstexecutes identified processes sequentially and then ex-ecutes them in reverse order At last it evaluates theoutputs of these two modes If they are different theweb application is vulnerable to a race condition
22 Clustering Web Pages
Crescenzi [16] presented an approach to cluster webpages based on the page structure The structuralsimilarity between web pages is defined by DOM treesof their hyperlinks The final clusters are used tobuild a model that describes the structure of the siteaccording to classes of pages and their connectivity
3 BUSINESS-LAYER PROCESSMINER
In this paper the BLProM is proposed to identifybusiness processes of the web application and we useits outputs as the input in the web application securitytesting in the business layer Then by analyzing theinteraction between business processes business layervulnerabilities can be detected
BLProM first preprocesses normal user HTTP traf-fic Then extracts web application pages in the trafficBLProM clusters similar pages to prevent the infinitegrowth of the user navigation graph Detected clustersare graph nodes and the graph edges are the relationsbetween detected clusters Based on detected nodesand edges the user navigation graph is extractedThen BLProM extracts business processes from thenavigation graph BLProM has two main steps
1 Extracting user navigation graph2 Detecting business processes in the web applica-
tion
Figure 2 shows the proposed steps to identify thebusiness process in the web application In the follow-ing we will explain each of these steps in detail
31 Extracting the User Navigation Graph
First the normal user starts to crawl the web appli-cation The traffic of a normal user is captured andstored It should be noted that the user permission
68 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Figure 2 Black-Box Approach to Detect Web Application Business Process
level can be different according to the role of the userand consequently the user navigation graph varies de-pending on the permission level In this paper thenormal user has the user-level permission and searchesthrough related permissible pages The normal usercrawls all different parts of the application that areallowed to search
BLProM initially extracts the user navigation graphfrom the stored traffic this is performed through stepsas follows
1 Preprocessing of raw input data2 Identifying existing web application pages in the
stored traffic3 Clustering the web application pages4 Extracting the user navigation graph
1) Preprocessing of raw input dataIn the preprocessing step the BLProM cleans thedata and removes irrelevant data samples In thispaper only HTTP requests and responses are usedFor the responses only those with successful statuscodes 200 and 209 are employed The BLProMremoves responses with failure status codes aswell as their corresponding requests Additionallyin this paper only GET and POST requests areneeded and the remaining ones are discarded
2) Identifying the web application pages in thestored traffic
2) Identifying the web application pages in the storedtrafficEach page of the application can be representedas a pair (main request corresponding responseto the main request) To identify the web applica-
tion pages it is first necessary to detect the mainrequests in the traffic After identifying the mainrequests the corresponding responses must alsobe identified
Identifying the main HTTP requests inthe traffic In the userrsquos stored traffic there areboth the main HTTP requests that lead to load-ing the web application pages and the secondaryHTTP requests that are responsible to load a fileimage etc of the page The BLProM must distin-guish between the main and secondary requests Inother words when the main requests are identifiedthe remaining ones are considered as secondaryrequests In this way any request that its Refererheader is different from the Referer of the previ-ous request is considered as the secondary requestsand the previous one is the main request Addition-ally the first request in the traffic is consideredthe main request because the first request does notinclude the Referer It should be noted that theReferer header field in the HTTP request indicatesthe URL of the previous page the user visited
Identifying the corresponding responsesto the main requests To identify the corre-sponding responses to the main requests it is onlyneeded to select the responses that their content-type field is texthtml Because the correspond-ing response to the secondary requests is often afile photo etc while the corresponding responseto the main requests is in the form of text andHTML Such responses are the main responses inthe HTTP traffic After identifying the main re-quests and responses in the traffic each pair (mainrequest corresponding responses to the main re-quest) indicates a page of the web application It
July 2019 Volume 6 Number 2 (pp 65ndash80) 69
Figure 3 Number of Consecutive Requests
should be noted that in responses of the main re-quests all secondary requests exist that lead toload the web pageIdentifyingwhether the last request in the
traffic is the main request or not If the lastresponse in the traffic is the main response thelast HTTP request is the main request as well
For example in Figure 3 requests 1 2 6 and 8are the main requests that are shown in red
The pseudocode of the algorithm used by the BL-ProM to identify web application pages is shownin Algorithm 1 As mentioned before each webapplication page can be indicated by a pair (mainrequest corresponding responses to the main re-quest) The pseudocode in Algorithm 1 can bedivided into four main parts1 Extracting main requests in the traffic (line 13-
26)2 Extracting corresponding responses to the main
requests (line 27-33)3 Identifying whether the last request in the traf-
fic is the main request or not (line 34-37)4 Extracting web application pages (lines 38-43)
In line 9 first all requests in the traffic eitherthe main requests or secondary ones are extractedand put in the HTTPRequest variable In line 10existing responses in the traffic are added to theHTTPResponse variable In lines 11 and 12 thetotal number of requests and responses is calcu-lated In line 15 it is checked whether the currentrequest is the first request if yes it is considered asthe main request and included in the MainRequestvariable In line 21 it is checked whether the Ref-erer field of the current request is different fromthe Referer field of the previous request if yes theprevious request is considered as the main request
In line 29 it is checked whether the content-type field of the current response is texthtml Ifthe given condition is met the current request isconsidered as the main request
In line 35 it is checked whether the last HTTPresponse is the main response if yes the last re-quest is considered as the main request as well
In line 41 all main requests and responses arethe web application pages which are shown as pairs(request response)
3) Clustering the web application pagesIn this step BLProM clusters the web applicationpages Clustering aims to put similar pages in thesame cluster This is helpful to prevent the infinitegrowth of the user navigation graph The pages inthe same cluster are similar to each other
At this stage all pages in the userrsquos stored traffichave been extracted as pairs (main request cor-responding response to the main request) Eachpair (main request corresponding response to themain request) shows one page of the applicationTo extract the optimal user navigation graph thesimilar extracted pages must be identified and clus-tered In the user navigation graph nodes indicatethe applicationrsquos unique pages and edges repre-sent the link between the pages To identify similarpages a criterion should be considered by the pur-pose of the clustering The type of operations thatthe user can perform on the page is considered asa measure for the separation of pages In otherwords two pages are similar if the user can per-form the same operations on them For exampleconsider two pages such that both contain onlya button but the title of the buttons is differentwhere in the first page the title is ldquocontinuerdquo andfor the second one it is ldquosaverdquo These two pagesare different because the user performs differentoperations on them Thus according to the crite-ria specified for the similar pages the followingpages are considered similar in web applicationsbull In the online shop application if the pages
related to the shopping cart of goods even ifthey contain different items they are consid-ered similar pagesbull In the online shop application the profile
pages of each product are similar becausethey have the same HTML structure in termsof the important HTML elementsbull Pages that display search results with different
keywords are similarbull If the structure of pages is a subset of another
page in terms of the important HTML ele-ments these pages are considered similarbull Two pages that both contain user comments
are considered similar even if the contents areabout different products
70 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 1 The Pseudocode for Extracting Web Application Pages From the Traffic
INPUT HITRtrafficHttpMessage1 HttpMessage2 HttpMessages3 HithMessagenOUTPUT WebPages as a set of (Requesti Respensei)
1 Begin2 let JWebPages = set of web pages3 let HttpRequest= Iset of HTTP Requests4 Iet HittpResponse = set of HTTP Responses5 let MainRequest= set of main HTTP Requests6 let MainResponse = set of main HTTP Responses7 let i k= 1 counter for current HttpMessage8 let LastReferer = empty NewRefere = empty9 HttpRequest=ExtractReg(HITRtraffic)extract HTTP Requests from HTTP traffic
10 HttpRespense - ExtractResp(HITRtraffic)extract HTTP Response from HTTP traffic11 n = extractNumber(HttpRequest) extract total number of HTTP Requests12 m = extractNumber(HttpResponse) extract total number of HTTP Responses13 extract Main HTTP Request from HTTPRequests14 for i 1 n do15 if (i = 1) then16 add MainRequestlarr HttpRequesti First Request is a Main Request17 LastReferer larr Referer of HttpRequesti18 else19 NewReferelarr Referer of HttpRequest120 end if21 if (NewReferer 6= LastReferer) then22 add MainRequestlarr HttpRequestiminus123 LastReferer larr NewRefere24 end if25 end for26 extract Main HTTP Response from HTTPResponse27 for k 1 m do28 if (content-type of HTTPResponsek = texthtml) then29 addMainRespenselarr HttpRespensek30 end if31 end for32 checking last request is main request or not33 if (HTTPResponsemisinMainResponse) then34 addMainRequestlarr HTTPRequestm35 end if36 extract set of web pages37 size = extractNumber(MainRequest) extract total number of Main HTTP Requests38 for j 1 size do39 WebPageslarr (MainRequestj)MainResponsej)40 end for41 return WebRages42 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 71
Table 1 Attribute Vector of a Page in osComerce Web Application
inputs null
Buttons
htmlbodybuttonReviews
htmlbodydivdivdivdivformdivdivspans
panbuttonAdd to Cart
anchors null
image htmlbodydivdivdivaimg
Figure 4 An Example of HTML Code
Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code
For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton
Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors
The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages
In the following these steps are discussed indetail1 Extracting attributes vectors of the page
The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector
WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =
prodni DOM(inputi)
DOMbuttons=prodn
i DOM(buttonsi)DOManchors=
prodni DOM(anchori)
DOMimgs=prodn
i DOM(imgi)
bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button
in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in
the pagebull DOM(img) DOM path of the existing im-
age in the pageSuppose the web application page contains
several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags
2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages
According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-
responding elements in the vector of page2bull All vector elements of page 1 are a subset
of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset
of corresponding elements in the vector ofpage 1
1 httpswwwoscommercecom
72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Figure 5 A Page of osCommerce Application
Algorithm 2 The Pseudocode for Identifying the Similar Pages
INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same
1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then
10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end
bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-
mentSimilar pages are identified according to the
above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages
3 Clustering web application pagesAfter identifying the similar pages they are
put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster
4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI
July 2019 Volume 6 Number 2 (pp 65ndash80) 73
Algorithm 3 The Pseudocode for Clustering the Web Application Pages
INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages
1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi
6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages
10 Ck larr wj
11 end if12 end for13 k + +14 end for15 return C16 end
of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph
In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1
rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages
Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph
Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph
32 Identifying Business Processes in the Ap-plication
To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess
Definition 3 [The Application Process (P)] The
74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph
INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges
1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj
6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj
7 end for8 for i k do9 for j i+ 1 k do
10 if (URIci capReferercj 6= null) then11 E larr E + CiCj
12 end if13 end for14 end for15 return E16 end
Algorithm 5 The Pseudocode for Extracting the User Navigation Graph
INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt
1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck
7 end if8 end for9 return C0 CE
10 end
process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci
Algorithm 6 shows the pseudocode for extractingprocesses
Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there
The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-
fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button
Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions
1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)
2 If the process passes its first node again it means
July 2019 Volume 6 Number 2 (pp 65ndash80) 75
Algorithm 6 The Pseudocode for Extracting Processes
INPUT the web application first node C0
the web application Graph edges EOUTPUT the web application graph process P as a set of web application process
1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0
4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0
5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E
10 end if11 end
the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)
All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process
4 EXPERIMENTAL RESULTS
The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2
The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications
The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input
5 EVALUATION
BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer
We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same
To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria
bull True Positive Samples that fit well into theircorrect clusters
bull False Positive Samples that fit in a cluster thatdo not belong to that cluster
bull False Negative Samples that do not fit in a clusterbut they belong to that cluster
bull Recall It is calculated by the following formula
recall = TruePositiveTruePositive+FalseNegative
bull Precision It is calculated by the following for-mula
76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 2 Testbed Profiles
Web server (test target)
CPU Pentium dual core-220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (BLProM machine)
CPU Intel corei7 220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (legal user)
CPU Pentium dual core i3-3210 GHZ
OS windows 7
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Table 3 Selected Web Applications for Evaluation
Web application Description
TomatoCart-11861 e-commerce
osCommerce-234 e-commerce
WackoPicko Web application for Sharing picture
Algorithm 7 The Pseudocode for Identifying the Application Business Processes
INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process
1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk
9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj
15 end if16 end for17 return BP18 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 77
Table 4 The Clusters of Selected Web Application Pages Evaluation
Criteria
Web applicationWackoPicko Tomatocart osCommerce
samples 89 150 210
clusters 29 66 40
true positive 65 146 205
false positive 24 4 5
false negative 23 3 4
recall 074 098 098
precision 073 097 098
f-measure 073 098 098
Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko
WackoPicko
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 89 89 ndash
Graph Nodes 22 89 752
Graph Edges 48 270 822
process (P) 12 48 75
Average edge in each process (E) 4 46 913
Average edges in all processes (PE) 48 2208 978
business processes 10 NA ndash
precision = TruePositiveTruePositive+FalsePositive
bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision
TruePositive+FalsePositive
The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application
To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in
Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application
Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities
BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
66 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
user behavior The lack of such a document makes itdifficult to detect business logic vulnerabilities Forexample adding a specific item several times in ashopping cart is a common feature but repeated usageof discount code is a kind of business logic vulnerabil-ities A human easily understands the difference be-tween these two scenarios while a scanner without anappropriate model of the web application cannot dis-tinguish between these two scenarios [4] Researches[5ndash7] have been conducted to automatically detectbusiness logic vulnerabilities but they are used forsmall applications Also the application source codeis required to generate the appropriate model of theapplication [4] So nowadays an automatic tool thatcan finds business logic vulnerabilities is required
In this paper we propose BLProM a black-boxtechnique for detecting web application business layerBLProM By identifying business processes in the webapplication provides the ability to identify businesslayer vulnerabilities In other words to dynamic se-curity testing of the web application in the businesslayer it is necessary first to identify the business pro-cesses of the web application (detecting business logicof the web applications) Then by analyzing the pro-cesses the business layer vulnerabilities are identifiedThe BLProM output is the Web application businessprocesses that are used as input in dynamic securitytesting in the business layer The proposed approachis independent of the technology used in web applica-tions and automatically finds business processes Alsowe will show that BLProM improves web applicationscanning because it detects similar pages in the webapplication and prevents scanning of similar pagesComparing the results of the web application scanningbetween BLProM and OWASP ZAP (web applicationopen source scanner) shows that BLProM improvesweb application scanning by about 96 In summarythis paper makes the following contributions
bull We present BLProM a black-box technique fordetecting web application business-layer
bull We present a new black-box approach for cluster-ing web pages
bull We show that web application scanning improvedabout 96 by identifying web application busi-ness processes
2 BACKGROUND and RELATEDWORK
The business layer determines the business logic of theweb applications The business layer is responsible fordata processing and data management and specifiesbusiness logic policies and rules Besides this layer val-idates the input data Figure 1 shows the three-layer
architecture of a web application and the position ofthe business layer in the web application The presen-tation layer is a user interface that displays data tothe user and receives inputs from the user In a webapplication this is the part that receives the HTTPrequest and returns the HTML response
The business layer handles data validation and busi-ness rules The data access layer communicates withthe database by constructing SQL queries
After receiving data from the user the data is avail-able to the business layer The web application usesthe data to run business processes Every business pro-cess has several steps that should be implemented re-spectively and processes may interact with each other
The business layer specifies the logic of the webapplication Business logic vulnerability is a defect inthe business layer A business logic attack vector isa legitimate request (usually multiple requests) andhas legitimate input values that abuse a modulersquosfunctionality to inflict damage and direct damage tothe business
21 Business Logic Attacks
There are two approaches to prevent business logicattacks 1) identifying attacks at runtime (defenseapproach) and 2) identifying logic vulnerabilities inthe web applications (prevention approach) In thedefense approach the behavior of the web applicationis monitored and an attack is reported whenever theweb application exits from the normal state In theprevention approach attack vectors are used to identifybusiness logic vulnerabilities
BLOCK [9] and Swaddler [10] use a defense ap-proach to prevent business logic attacks BLOCK Firstobtains the behavioral model of the web applicationby observing the interaction of the clients and theweb application It extracts a set of constants fromthe requestresponse sequence and session variablesBLOCK Identifies any request or response that vio-lates identified constants as an attack
Swaddler provides an anomaly detection methodfor detecting attacks It uses anomaly in the internalstate of the web application to detect vulnerabilitiesIn other words the web applicationrsquos internal state ismonitored in the learning phase and the normal valuesof the web application state are extracted which definethe web application profile Then in the detectionphase abnormal states are identified
MiMoSA [5] uses a prevention approach in the formof a white-box approach to identify business logic vul-nerabilities MiMoSA first provides a web applicationmodel based on the web applicationrsquos state and work-
July 2019 Volume 6 Number 2 (pp 65ndash80) 67
Figure 1 The Three-Layer Architecture of a Web Application [8]
flow MiMoSA detects multi-step attacks by analyz-ing the relationship between the web application andthe database as well as the connections in the webapplication
SENTINEL [11] and Pellegrino [3] use a preventionapproach to identify business logic vulnerabilities inthe form of a black-box approach
SENTINEL [11] is a black-box approach for detect-ing logical weaknesses in database access SENTINELgenerates a state machine of the web and extracts a setof invariants from observed SQL queries and responsesand session variables as the application specificationAny SQL query that violates defined invariants isidentified as an attack
Pellegrino et al [3] propose a black-box techniqueto detect logic vulnerabilities in web applications Thistechnique extracts behavioral patterns from networktraces in which the user interacts with a certain ap-plicationrsquos functionality First the web application ismodeled and then attack vectors are applied to themodel
Our previous works BLDAST [12 13] andBLTOCTTOU [14] use a prevention approach toidentify business logic vulnerabilities in the form of ablack-box approach BLProM [15] can be used as aninput for BLTOCTTOU and BLDAST
BLDAST [12 13] is a dynamic and a black-box vul-nerability analysis approach that identifies businesslogic vulnerabilities of a web application against flood-ing DoS attacks BLDAST assesses web applicationresiliency against flooding DoS attacks It can takeinto account the business processes of a web applica-tion BLDAST selects critical pages in business pro-cesses A critical page has considerable response timeTherefore a critical process can enforce heavy loadinto the target and lead the web server to become un-responsive The goal of the BLDAST is to find thesecritical processes within the web applications
BLTOCTTOU [14] is a black-box dynamic applica-tion security tester for detecting business logic vulnera-bilities against race condition attacks BLTOCTTOUidentifies vulnerabilities with the help of finding thebusiness processes of the web application BLTOCT-TOU detects business processes that interact witheach other one process should set the value of a vari-
able and the other should read or write that variableTo identify the race condition BLTOCTTOU firstexecutes identified processes sequentially and then ex-ecutes them in reverse order At last it evaluates theoutputs of these two modes If they are different theweb application is vulnerable to a race condition
22 Clustering Web Pages
Crescenzi [16] presented an approach to cluster webpages based on the page structure The structuralsimilarity between web pages is defined by DOM treesof their hyperlinks The final clusters are used tobuild a model that describes the structure of the siteaccording to classes of pages and their connectivity
3 BUSINESS-LAYER PROCESSMINER
In this paper the BLProM is proposed to identifybusiness processes of the web application and we useits outputs as the input in the web application securitytesting in the business layer Then by analyzing theinteraction between business processes business layervulnerabilities can be detected
BLProM first preprocesses normal user HTTP traf-fic Then extracts web application pages in the trafficBLProM clusters similar pages to prevent the infinitegrowth of the user navigation graph Detected clustersare graph nodes and the graph edges are the relationsbetween detected clusters Based on detected nodesand edges the user navigation graph is extractedThen BLProM extracts business processes from thenavigation graph BLProM has two main steps
1 Extracting user navigation graph2 Detecting business processes in the web applica-
tion
Figure 2 shows the proposed steps to identify thebusiness process in the web application In the follow-ing we will explain each of these steps in detail
31 Extracting the User Navigation Graph
First the normal user starts to crawl the web appli-cation The traffic of a normal user is captured andstored It should be noted that the user permission
68 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Figure 2 Black-Box Approach to Detect Web Application Business Process
level can be different according to the role of the userand consequently the user navigation graph varies de-pending on the permission level In this paper thenormal user has the user-level permission and searchesthrough related permissible pages The normal usercrawls all different parts of the application that areallowed to search
BLProM initially extracts the user navigation graphfrom the stored traffic this is performed through stepsas follows
1 Preprocessing of raw input data2 Identifying existing web application pages in the
stored traffic3 Clustering the web application pages4 Extracting the user navigation graph
1) Preprocessing of raw input dataIn the preprocessing step the BLProM cleans thedata and removes irrelevant data samples In thispaper only HTTP requests and responses are usedFor the responses only those with successful statuscodes 200 and 209 are employed The BLProMremoves responses with failure status codes aswell as their corresponding requests Additionallyin this paper only GET and POST requests areneeded and the remaining ones are discarded
2) Identifying the web application pages in thestored traffic
2) Identifying the web application pages in the storedtrafficEach page of the application can be representedas a pair (main request corresponding responseto the main request) To identify the web applica-
tion pages it is first necessary to detect the mainrequests in the traffic After identifying the mainrequests the corresponding responses must alsobe identified
Identifying the main HTTP requests inthe traffic In the userrsquos stored traffic there areboth the main HTTP requests that lead to load-ing the web application pages and the secondaryHTTP requests that are responsible to load a fileimage etc of the page The BLProM must distin-guish between the main and secondary requests Inother words when the main requests are identifiedthe remaining ones are considered as secondaryrequests In this way any request that its Refererheader is different from the Referer of the previ-ous request is considered as the secondary requestsand the previous one is the main request Addition-ally the first request in the traffic is consideredthe main request because the first request does notinclude the Referer It should be noted that theReferer header field in the HTTP request indicatesthe URL of the previous page the user visited
Identifying the corresponding responsesto the main requests To identify the corre-sponding responses to the main requests it is onlyneeded to select the responses that their content-type field is texthtml Because the correspond-ing response to the secondary requests is often afile photo etc while the corresponding responseto the main requests is in the form of text andHTML Such responses are the main responses inthe HTTP traffic After identifying the main re-quests and responses in the traffic each pair (mainrequest corresponding responses to the main re-quest) indicates a page of the web application It
July 2019 Volume 6 Number 2 (pp 65ndash80) 69
Figure 3 Number of Consecutive Requests
should be noted that in responses of the main re-quests all secondary requests exist that lead toload the web pageIdentifyingwhether the last request in the
traffic is the main request or not If the lastresponse in the traffic is the main response thelast HTTP request is the main request as well
For example in Figure 3 requests 1 2 6 and 8are the main requests that are shown in red
The pseudocode of the algorithm used by the BL-ProM to identify web application pages is shownin Algorithm 1 As mentioned before each webapplication page can be indicated by a pair (mainrequest corresponding responses to the main re-quest) The pseudocode in Algorithm 1 can bedivided into four main parts1 Extracting main requests in the traffic (line 13-
26)2 Extracting corresponding responses to the main
requests (line 27-33)3 Identifying whether the last request in the traf-
fic is the main request or not (line 34-37)4 Extracting web application pages (lines 38-43)
In line 9 first all requests in the traffic eitherthe main requests or secondary ones are extractedand put in the HTTPRequest variable In line 10existing responses in the traffic are added to theHTTPResponse variable In lines 11 and 12 thetotal number of requests and responses is calcu-lated In line 15 it is checked whether the currentrequest is the first request if yes it is considered asthe main request and included in the MainRequestvariable In line 21 it is checked whether the Ref-erer field of the current request is different fromthe Referer field of the previous request if yes theprevious request is considered as the main request
In line 29 it is checked whether the content-type field of the current response is texthtml Ifthe given condition is met the current request isconsidered as the main request
In line 35 it is checked whether the last HTTPresponse is the main response if yes the last re-quest is considered as the main request as well
In line 41 all main requests and responses arethe web application pages which are shown as pairs(request response)
3) Clustering the web application pagesIn this step BLProM clusters the web applicationpages Clustering aims to put similar pages in thesame cluster This is helpful to prevent the infinitegrowth of the user navigation graph The pages inthe same cluster are similar to each other
At this stage all pages in the userrsquos stored traffichave been extracted as pairs (main request cor-responding response to the main request) Eachpair (main request corresponding response to themain request) shows one page of the applicationTo extract the optimal user navigation graph thesimilar extracted pages must be identified and clus-tered In the user navigation graph nodes indicatethe applicationrsquos unique pages and edges repre-sent the link between the pages To identify similarpages a criterion should be considered by the pur-pose of the clustering The type of operations thatthe user can perform on the page is considered asa measure for the separation of pages In otherwords two pages are similar if the user can per-form the same operations on them For exampleconsider two pages such that both contain onlya button but the title of the buttons is differentwhere in the first page the title is ldquocontinuerdquo andfor the second one it is ldquosaverdquo These two pagesare different because the user performs differentoperations on them Thus according to the crite-ria specified for the similar pages the followingpages are considered similar in web applicationsbull In the online shop application if the pages
related to the shopping cart of goods even ifthey contain different items they are consid-ered similar pagesbull In the online shop application the profile
pages of each product are similar becausethey have the same HTML structure in termsof the important HTML elementsbull Pages that display search results with different
keywords are similarbull If the structure of pages is a subset of another
page in terms of the important HTML ele-ments these pages are considered similarbull Two pages that both contain user comments
are considered similar even if the contents areabout different products
70 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 1 The Pseudocode for Extracting Web Application Pages From the Traffic
INPUT HITRtrafficHttpMessage1 HttpMessage2 HttpMessages3 HithMessagenOUTPUT WebPages as a set of (Requesti Respensei)
1 Begin2 let JWebPages = set of web pages3 let HttpRequest= Iset of HTTP Requests4 Iet HittpResponse = set of HTTP Responses5 let MainRequest= set of main HTTP Requests6 let MainResponse = set of main HTTP Responses7 let i k= 1 counter for current HttpMessage8 let LastReferer = empty NewRefere = empty9 HttpRequest=ExtractReg(HITRtraffic)extract HTTP Requests from HTTP traffic
10 HttpRespense - ExtractResp(HITRtraffic)extract HTTP Response from HTTP traffic11 n = extractNumber(HttpRequest) extract total number of HTTP Requests12 m = extractNumber(HttpResponse) extract total number of HTTP Responses13 extract Main HTTP Request from HTTPRequests14 for i 1 n do15 if (i = 1) then16 add MainRequestlarr HttpRequesti First Request is a Main Request17 LastReferer larr Referer of HttpRequesti18 else19 NewReferelarr Referer of HttpRequest120 end if21 if (NewReferer 6= LastReferer) then22 add MainRequestlarr HttpRequestiminus123 LastReferer larr NewRefere24 end if25 end for26 extract Main HTTP Response from HTTPResponse27 for k 1 m do28 if (content-type of HTTPResponsek = texthtml) then29 addMainRespenselarr HttpRespensek30 end if31 end for32 checking last request is main request or not33 if (HTTPResponsemisinMainResponse) then34 addMainRequestlarr HTTPRequestm35 end if36 extract set of web pages37 size = extractNumber(MainRequest) extract total number of Main HTTP Requests38 for j 1 size do39 WebPageslarr (MainRequestj)MainResponsej)40 end for41 return WebRages42 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 71
Table 1 Attribute Vector of a Page in osComerce Web Application
inputs null
Buttons
htmlbodybuttonReviews
htmlbodydivdivdivdivformdivdivspans
panbuttonAdd to Cart
anchors null
image htmlbodydivdivdivaimg
Figure 4 An Example of HTML Code
Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code
For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton
Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors
The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages
In the following these steps are discussed indetail1 Extracting attributes vectors of the page
The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector
WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =
prodni DOM(inputi)
DOMbuttons=prodn
i DOM(buttonsi)DOManchors=
prodni DOM(anchori)
DOMimgs=prodn
i DOM(imgi)
bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button
in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in
the pagebull DOM(img) DOM path of the existing im-
age in the pageSuppose the web application page contains
several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags
2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages
According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-
responding elements in the vector of page2bull All vector elements of page 1 are a subset
of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset
of corresponding elements in the vector ofpage 1
1 httpswwwoscommercecom
72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Figure 5 A Page of osCommerce Application
Algorithm 2 The Pseudocode for Identifying the Similar Pages
INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same
1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then
10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end
bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-
mentSimilar pages are identified according to the
above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages
3 Clustering web application pagesAfter identifying the similar pages they are
put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster
4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI
July 2019 Volume 6 Number 2 (pp 65ndash80) 73
Algorithm 3 The Pseudocode for Clustering the Web Application Pages
INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages
1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi
6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages
10 Ck larr wj
11 end if12 end for13 k + +14 end for15 return C16 end
of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph
In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1
rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages
Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph
Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph
32 Identifying Business Processes in the Ap-plication
To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess
Definition 3 [The Application Process (P)] The
74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph
INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges
1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj
6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj
7 end for8 for i k do9 for j i+ 1 k do
10 if (URIci capReferercj 6= null) then11 E larr E + CiCj
12 end if13 end for14 end for15 return E16 end
Algorithm 5 The Pseudocode for Extracting the User Navigation Graph
INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt
1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck
7 end if8 end for9 return C0 CE
10 end
process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci
Algorithm 6 shows the pseudocode for extractingprocesses
Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there
The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-
fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button
Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions
1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)
2 If the process passes its first node again it means
July 2019 Volume 6 Number 2 (pp 65ndash80) 75
Algorithm 6 The Pseudocode for Extracting Processes
INPUT the web application first node C0
the web application Graph edges EOUTPUT the web application graph process P as a set of web application process
1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0
4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0
5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E
10 end if11 end
the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)
All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process
4 EXPERIMENTAL RESULTS
The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2
The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications
The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input
5 EVALUATION
BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer
We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same
To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria
bull True Positive Samples that fit well into theircorrect clusters
bull False Positive Samples that fit in a cluster thatdo not belong to that cluster
bull False Negative Samples that do not fit in a clusterbut they belong to that cluster
bull Recall It is calculated by the following formula
recall = TruePositiveTruePositive+FalseNegative
bull Precision It is calculated by the following for-mula
76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 2 Testbed Profiles
Web server (test target)
CPU Pentium dual core-220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (BLProM machine)
CPU Intel corei7 220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (legal user)
CPU Pentium dual core i3-3210 GHZ
OS windows 7
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Table 3 Selected Web Applications for Evaluation
Web application Description
TomatoCart-11861 e-commerce
osCommerce-234 e-commerce
WackoPicko Web application for Sharing picture
Algorithm 7 The Pseudocode for Identifying the Application Business Processes
INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process
1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk
9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj
15 end if16 end for17 return BP18 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 77
Table 4 The Clusters of Selected Web Application Pages Evaluation
Criteria
Web applicationWackoPicko Tomatocart osCommerce
samples 89 150 210
clusters 29 66 40
true positive 65 146 205
false positive 24 4 5
false negative 23 3 4
recall 074 098 098
precision 073 097 098
f-measure 073 098 098
Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko
WackoPicko
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 89 89 ndash
Graph Nodes 22 89 752
Graph Edges 48 270 822
process (P) 12 48 75
Average edge in each process (E) 4 46 913
Average edges in all processes (PE) 48 2208 978
business processes 10 NA ndash
precision = TruePositiveTruePositive+FalsePositive
bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision
TruePositive+FalsePositive
The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application
To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in
Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application
Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities
BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
July 2019 Volume 6 Number 2 (pp 65ndash80) 67
Figure 1 The Three-Layer Architecture of a Web Application [8]
flow MiMoSA detects multi-step attacks by analyz-ing the relationship between the web application andthe database as well as the connections in the webapplication
SENTINEL [11] and Pellegrino [3] use a preventionapproach to identify business logic vulnerabilities inthe form of a black-box approach
SENTINEL [11] is a black-box approach for detect-ing logical weaknesses in database access SENTINELgenerates a state machine of the web and extracts a setof invariants from observed SQL queries and responsesand session variables as the application specificationAny SQL query that violates defined invariants isidentified as an attack
Pellegrino et al [3] propose a black-box techniqueto detect logic vulnerabilities in web applications Thistechnique extracts behavioral patterns from networktraces in which the user interacts with a certain ap-plicationrsquos functionality First the web application ismodeled and then attack vectors are applied to themodel
Our previous works BLDAST [12 13] andBLTOCTTOU [14] use a prevention approach toidentify business logic vulnerabilities in the form of ablack-box approach BLProM [15] can be used as aninput for BLTOCTTOU and BLDAST
BLDAST [12 13] is a dynamic and a black-box vul-nerability analysis approach that identifies businesslogic vulnerabilities of a web application against flood-ing DoS attacks BLDAST assesses web applicationresiliency against flooding DoS attacks It can takeinto account the business processes of a web applica-tion BLDAST selects critical pages in business pro-cesses A critical page has considerable response timeTherefore a critical process can enforce heavy loadinto the target and lead the web server to become un-responsive The goal of the BLDAST is to find thesecritical processes within the web applications
BLTOCTTOU [14] is a black-box dynamic applica-tion security tester for detecting business logic vulnera-bilities against race condition attacks BLTOCTTOUidentifies vulnerabilities with the help of finding thebusiness processes of the web application BLTOCT-TOU detects business processes that interact witheach other one process should set the value of a vari-
able and the other should read or write that variableTo identify the race condition BLTOCTTOU firstexecutes identified processes sequentially and then ex-ecutes them in reverse order At last it evaluates theoutputs of these two modes If they are different theweb application is vulnerable to a race condition
22 Clustering Web Pages
Crescenzi [16] presented an approach to cluster webpages based on the page structure The structuralsimilarity between web pages is defined by DOM treesof their hyperlinks The final clusters are used tobuild a model that describes the structure of the siteaccording to classes of pages and their connectivity
3 BUSINESS-LAYER PROCESSMINER
In this paper the BLProM is proposed to identifybusiness processes of the web application and we useits outputs as the input in the web application securitytesting in the business layer Then by analyzing theinteraction between business processes business layervulnerabilities can be detected
BLProM first preprocesses normal user HTTP traf-fic Then extracts web application pages in the trafficBLProM clusters similar pages to prevent the infinitegrowth of the user navigation graph Detected clustersare graph nodes and the graph edges are the relationsbetween detected clusters Based on detected nodesand edges the user navigation graph is extractedThen BLProM extracts business processes from thenavigation graph BLProM has two main steps
1 Extracting user navigation graph2 Detecting business processes in the web applica-
tion
Figure 2 shows the proposed steps to identify thebusiness process in the web application In the follow-ing we will explain each of these steps in detail
31 Extracting the User Navigation Graph
First the normal user starts to crawl the web appli-cation The traffic of a normal user is captured andstored It should be noted that the user permission
68 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Figure 2 Black-Box Approach to Detect Web Application Business Process
level can be different according to the role of the userand consequently the user navigation graph varies de-pending on the permission level In this paper thenormal user has the user-level permission and searchesthrough related permissible pages The normal usercrawls all different parts of the application that areallowed to search
BLProM initially extracts the user navigation graphfrom the stored traffic this is performed through stepsas follows
1 Preprocessing of raw input data2 Identifying existing web application pages in the
stored traffic3 Clustering the web application pages4 Extracting the user navigation graph
1) Preprocessing of raw input dataIn the preprocessing step the BLProM cleans thedata and removes irrelevant data samples In thispaper only HTTP requests and responses are usedFor the responses only those with successful statuscodes 200 and 209 are employed The BLProMremoves responses with failure status codes aswell as their corresponding requests Additionallyin this paper only GET and POST requests areneeded and the remaining ones are discarded
2) Identifying the web application pages in thestored traffic
2) Identifying the web application pages in the storedtrafficEach page of the application can be representedas a pair (main request corresponding responseto the main request) To identify the web applica-
tion pages it is first necessary to detect the mainrequests in the traffic After identifying the mainrequests the corresponding responses must alsobe identified
Identifying the main HTTP requests inthe traffic In the userrsquos stored traffic there areboth the main HTTP requests that lead to load-ing the web application pages and the secondaryHTTP requests that are responsible to load a fileimage etc of the page The BLProM must distin-guish between the main and secondary requests Inother words when the main requests are identifiedthe remaining ones are considered as secondaryrequests In this way any request that its Refererheader is different from the Referer of the previ-ous request is considered as the secondary requestsand the previous one is the main request Addition-ally the first request in the traffic is consideredthe main request because the first request does notinclude the Referer It should be noted that theReferer header field in the HTTP request indicatesthe URL of the previous page the user visited
Identifying the corresponding responsesto the main requests To identify the corre-sponding responses to the main requests it is onlyneeded to select the responses that their content-type field is texthtml Because the correspond-ing response to the secondary requests is often afile photo etc while the corresponding responseto the main requests is in the form of text andHTML Such responses are the main responses inthe HTTP traffic After identifying the main re-quests and responses in the traffic each pair (mainrequest corresponding responses to the main re-quest) indicates a page of the web application It
July 2019 Volume 6 Number 2 (pp 65ndash80) 69
Figure 3 Number of Consecutive Requests
should be noted that in responses of the main re-quests all secondary requests exist that lead toload the web pageIdentifyingwhether the last request in the
traffic is the main request or not If the lastresponse in the traffic is the main response thelast HTTP request is the main request as well
For example in Figure 3 requests 1 2 6 and 8are the main requests that are shown in red
The pseudocode of the algorithm used by the BL-ProM to identify web application pages is shownin Algorithm 1 As mentioned before each webapplication page can be indicated by a pair (mainrequest corresponding responses to the main re-quest) The pseudocode in Algorithm 1 can bedivided into four main parts1 Extracting main requests in the traffic (line 13-
26)2 Extracting corresponding responses to the main
requests (line 27-33)3 Identifying whether the last request in the traf-
fic is the main request or not (line 34-37)4 Extracting web application pages (lines 38-43)
In line 9 first all requests in the traffic eitherthe main requests or secondary ones are extractedand put in the HTTPRequest variable In line 10existing responses in the traffic are added to theHTTPResponse variable In lines 11 and 12 thetotal number of requests and responses is calcu-lated In line 15 it is checked whether the currentrequest is the first request if yes it is considered asthe main request and included in the MainRequestvariable In line 21 it is checked whether the Ref-erer field of the current request is different fromthe Referer field of the previous request if yes theprevious request is considered as the main request
In line 29 it is checked whether the content-type field of the current response is texthtml Ifthe given condition is met the current request isconsidered as the main request
In line 35 it is checked whether the last HTTPresponse is the main response if yes the last re-quest is considered as the main request as well
In line 41 all main requests and responses arethe web application pages which are shown as pairs(request response)
3) Clustering the web application pagesIn this step BLProM clusters the web applicationpages Clustering aims to put similar pages in thesame cluster This is helpful to prevent the infinitegrowth of the user navigation graph The pages inthe same cluster are similar to each other
At this stage all pages in the userrsquos stored traffichave been extracted as pairs (main request cor-responding response to the main request) Eachpair (main request corresponding response to themain request) shows one page of the applicationTo extract the optimal user navigation graph thesimilar extracted pages must be identified and clus-tered In the user navigation graph nodes indicatethe applicationrsquos unique pages and edges repre-sent the link between the pages To identify similarpages a criterion should be considered by the pur-pose of the clustering The type of operations thatthe user can perform on the page is considered asa measure for the separation of pages In otherwords two pages are similar if the user can per-form the same operations on them For exampleconsider two pages such that both contain onlya button but the title of the buttons is differentwhere in the first page the title is ldquocontinuerdquo andfor the second one it is ldquosaverdquo These two pagesare different because the user performs differentoperations on them Thus according to the crite-ria specified for the similar pages the followingpages are considered similar in web applicationsbull In the online shop application if the pages
related to the shopping cart of goods even ifthey contain different items they are consid-ered similar pagesbull In the online shop application the profile
pages of each product are similar becausethey have the same HTML structure in termsof the important HTML elementsbull Pages that display search results with different
keywords are similarbull If the structure of pages is a subset of another
page in terms of the important HTML ele-ments these pages are considered similarbull Two pages that both contain user comments
are considered similar even if the contents areabout different products
70 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 1 The Pseudocode for Extracting Web Application Pages From the Traffic
INPUT HITRtrafficHttpMessage1 HttpMessage2 HttpMessages3 HithMessagenOUTPUT WebPages as a set of (Requesti Respensei)
1 Begin2 let JWebPages = set of web pages3 let HttpRequest= Iset of HTTP Requests4 Iet HittpResponse = set of HTTP Responses5 let MainRequest= set of main HTTP Requests6 let MainResponse = set of main HTTP Responses7 let i k= 1 counter for current HttpMessage8 let LastReferer = empty NewRefere = empty9 HttpRequest=ExtractReg(HITRtraffic)extract HTTP Requests from HTTP traffic
10 HttpRespense - ExtractResp(HITRtraffic)extract HTTP Response from HTTP traffic11 n = extractNumber(HttpRequest) extract total number of HTTP Requests12 m = extractNumber(HttpResponse) extract total number of HTTP Responses13 extract Main HTTP Request from HTTPRequests14 for i 1 n do15 if (i = 1) then16 add MainRequestlarr HttpRequesti First Request is a Main Request17 LastReferer larr Referer of HttpRequesti18 else19 NewReferelarr Referer of HttpRequest120 end if21 if (NewReferer 6= LastReferer) then22 add MainRequestlarr HttpRequestiminus123 LastReferer larr NewRefere24 end if25 end for26 extract Main HTTP Response from HTTPResponse27 for k 1 m do28 if (content-type of HTTPResponsek = texthtml) then29 addMainRespenselarr HttpRespensek30 end if31 end for32 checking last request is main request or not33 if (HTTPResponsemisinMainResponse) then34 addMainRequestlarr HTTPRequestm35 end if36 extract set of web pages37 size = extractNumber(MainRequest) extract total number of Main HTTP Requests38 for j 1 size do39 WebPageslarr (MainRequestj)MainResponsej)40 end for41 return WebRages42 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 71
Table 1 Attribute Vector of a Page in osComerce Web Application
inputs null
Buttons
htmlbodybuttonReviews
htmlbodydivdivdivdivformdivdivspans
panbuttonAdd to Cart
anchors null
image htmlbodydivdivdivaimg
Figure 4 An Example of HTML Code
Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code
For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton
Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors
The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages
In the following these steps are discussed indetail1 Extracting attributes vectors of the page
The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector
WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =
prodni DOM(inputi)
DOMbuttons=prodn
i DOM(buttonsi)DOManchors=
prodni DOM(anchori)
DOMimgs=prodn
i DOM(imgi)
bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button
in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in
the pagebull DOM(img) DOM path of the existing im-
age in the pageSuppose the web application page contains
several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags
2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages
According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-
responding elements in the vector of page2bull All vector elements of page 1 are a subset
of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset
of corresponding elements in the vector ofpage 1
1 httpswwwoscommercecom
72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Figure 5 A Page of osCommerce Application
Algorithm 2 The Pseudocode for Identifying the Similar Pages
INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same
1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then
10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end
bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-
mentSimilar pages are identified according to the
above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages
3 Clustering web application pagesAfter identifying the similar pages they are
put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster
4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI
July 2019 Volume 6 Number 2 (pp 65ndash80) 73
Algorithm 3 The Pseudocode for Clustering the Web Application Pages
INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages
1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi
6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages
10 Ck larr wj
11 end if12 end for13 k + +14 end for15 return C16 end
of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph
In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1
rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages
Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph
Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph
32 Identifying Business Processes in the Ap-plication
To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess
Definition 3 [The Application Process (P)] The
74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph
INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges
1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj
6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj
7 end for8 for i k do9 for j i+ 1 k do
10 if (URIci capReferercj 6= null) then11 E larr E + CiCj
12 end if13 end for14 end for15 return E16 end
Algorithm 5 The Pseudocode for Extracting the User Navigation Graph
INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt
1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck
7 end if8 end for9 return C0 CE
10 end
process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci
Algorithm 6 shows the pseudocode for extractingprocesses
Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there
The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-
fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button
Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions
1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)
2 If the process passes its first node again it means
July 2019 Volume 6 Number 2 (pp 65ndash80) 75
Algorithm 6 The Pseudocode for Extracting Processes
INPUT the web application first node C0
the web application Graph edges EOUTPUT the web application graph process P as a set of web application process
1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0
4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0
5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E
10 end if11 end
the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)
All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process
4 EXPERIMENTAL RESULTS
The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2
The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications
The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input
5 EVALUATION
BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer
We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same
To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria
bull True Positive Samples that fit well into theircorrect clusters
bull False Positive Samples that fit in a cluster thatdo not belong to that cluster
bull False Negative Samples that do not fit in a clusterbut they belong to that cluster
bull Recall It is calculated by the following formula
recall = TruePositiveTruePositive+FalseNegative
bull Precision It is calculated by the following for-mula
76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 2 Testbed Profiles
Web server (test target)
CPU Pentium dual core-220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (BLProM machine)
CPU Intel corei7 220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (legal user)
CPU Pentium dual core i3-3210 GHZ
OS windows 7
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Table 3 Selected Web Applications for Evaluation
Web application Description
TomatoCart-11861 e-commerce
osCommerce-234 e-commerce
WackoPicko Web application for Sharing picture
Algorithm 7 The Pseudocode for Identifying the Application Business Processes
INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process
1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk
9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj
15 end if16 end for17 return BP18 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 77
Table 4 The Clusters of Selected Web Application Pages Evaluation
Criteria
Web applicationWackoPicko Tomatocart osCommerce
samples 89 150 210
clusters 29 66 40
true positive 65 146 205
false positive 24 4 5
false negative 23 3 4
recall 074 098 098
precision 073 097 098
f-measure 073 098 098
Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko
WackoPicko
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 89 89 ndash
Graph Nodes 22 89 752
Graph Edges 48 270 822
process (P) 12 48 75
Average edge in each process (E) 4 46 913
Average edges in all processes (PE) 48 2208 978
business processes 10 NA ndash
precision = TruePositiveTruePositive+FalsePositive
bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision
TruePositive+FalsePositive
The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application
To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in
Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application
Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities
BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
68 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Figure 2 Black-Box Approach to Detect Web Application Business Process
level can be different according to the role of the userand consequently the user navigation graph varies de-pending on the permission level In this paper thenormal user has the user-level permission and searchesthrough related permissible pages The normal usercrawls all different parts of the application that areallowed to search
BLProM initially extracts the user navigation graphfrom the stored traffic this is performed through stepsas follows
1 Preprocessing of raw input data2 Identifying existing web application pages in the
stored traffic3 Clustering the web application pages4 Extracting the user navigation graph
1) Preprocessing of raw input dataIn the preprocessing step the BLProM cleans thedata and removes irrelevant data samples In thispaper only HTTP requests and responses are usedFor the responses only those with successful statuscodes 200 and 209 are employed The BLProMremoves responses with failure status codes aswell as their corresponding requests Additionallyin this paper only GET and POST requests areneeded and the remaining ones are discarded
2) Identifying the web application pages in thestored traffic
2) Identifying the web application pages in the storedtrafficEach page of the application can be representedas a pair (main request corresponding responseto the main request) To identify the web applica-
tion pages it is first necessary to detect the mainrequests in the traffic After identifying the mainrequests the corresponding responses must alsobe identified
Identifying the main HTTP requests inthe traffic In the userrsquos stored traffic there areboth the main HTTP requests that lead to load-ing the web application pages and the secondaryHTTP requests that are responsible to load a fileimage etc of the page The BLProM must distin-guish between the main and secondary requests Inother words when the main requests are identifiedthe remaining ones are considered as secondaryrequests In this way any request that its Refererheader is different from the Referer of the previ-ous request is considered as the secondary requestsand the previous one is the main request Addition-ally the first request in the traffic is consideredthe main request because the first request does notinclude the Referer It should be noted that theReferer header field in the HTTP request indicatesthe URL of the previous page the user visited
Identifying the corresponding responsesto the main requests To identify the corre-sponding responses to the main requests it is onlyneeded to select the responses that their content-type field is texthtml Because the correspond-ing response to the secondary requests is often afile photo etc while the corresponding responseto the main requests is in the form of text andHTML Such responses are the main responses inthe HTTP traffic After identifying the main re-quests and responses in the traffic each pair (mainrequest corresponding responses to the main re-quest) indicates a page of the web application It
July 2019 Volume 6 Number 2 (pp 65ndash80) 69
Figure 3 Number of Consecutive Requests
should be noted that in responses of the main re-quests all secondary requests exist that lead toload the web pageIdentifyingwhether the last request in the
traffic is the main request or not If the lastresponse in the traffic is the main response thelast HTTP request is the main request as well
For example in Figure 3 requests 1 2 6 and 8are the main requests that are shown in red
The pseudocode of the algorithm used by the BL-ProM to identify web application pages is shownin Algorithm 1 As mentioned before each webapplication page can be indicated by a pair (mainrequest corresponding responses to the main re-quest) The pseudocode in Algorithm 1 can bedivided into four main parts1 Extracting main requests in the traffic (line 13-
26)2 Extracting corresponding responses to the main
requests (line 27-33)3 Identifying whether the last request in the traf-
fic is the main request or not (line 34-37)4 Extracting web application pages (lines 38-43)
In line 9 first all requests in the traffic eitherthe main requests or secondary ones are extractedand put in the HTTPRequest variable In line 10existing responses in the traffic are added to theHTTPResponse variable In lines 11 and 12 thetotal number of requests and responses is calcu-lated In line 15 it is checked whether the currentrequest is the first request if yes it is considered asthe main request and included in the MainRequestvariable In line 21 it is checked whether the Ref-erer field of the current request is different fromthe Referer field of the previous request if yes theprevious request is considered as the main request
In line 29 it is checked whether the content-type field of the current response is texthtml Ifthe given condition is met the current request isconsidered as the main request
In line 35 it is checked whether the last HTTPresponse is the main response if yes the last re-quest is considered as the main request as well
In line 41 all main requests and responses arethe web application pages which are shown as pairs(request response)
3) Clustering the web application pagesIn this step BLProM clusters the web applicationpages Clustering aims to put similar pages in thesame cluster This is helpful to prevent the infinitegrowth of the user navigation graph The pages inthe same cluster are similar to each other
At this stage all pages in the userrsquos stored traffichave been extracted as pairs (main request cor-responding response to the main request) Eachpair (main request corresponding response to themain request) shows one page of the applicationTo extract the optimal user navigation graph thesimilar extracted pages must be identified and clus-tered In the user navigation graph nodes indicatethe applicationrsquos unique pages and edges repre-sent the link between the pages To identify similarpages a criterion should be considered by the pur-pose of the clustering The type of operations thatthe user can perform on the page is considered asa measure for the separation of pages In otherwords two pages are similar if the user can per-form the same operations on them For exampleconsider two pages such that both contain onlya button but the title of the buttons is differentwhere in the first page the title is ldquocontinuerdquo andfor the second one it is ldquosaverdquo These two pagesare different because the user performs differentoperations on them Thus according to the crite-ria specified for the similar pages the followingpages are considered similar in web applicationsbull In the online shop application if the pages
related to the shopping cart of goods even ifthey contain different items they are consid-ered similar pagesbull In the online shop application the profile
pages of each product are similar becausethey have the same HTML structure in termsof the important HTML elementsbull Pages that display search results with different
keywords are similarbull If the structure of pages is a subset of another
page in terms of the important HTML ele-ments these pages are considered similarbull Two pages that both contain user comments
are considered similar even if the contents areabout different products
70 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 1 The Pseudocode for Extracting Web Application Pages From the Traffic
INPUT HITRtrafficHttpMessage1 HttpMessage2 HttpMessages3 HithMessagenOUTPUT WebPages as a set of (Requesti Respensei)
1 Begin2 let JWebPages = set of web pages3 let HttpRequest= Iset of HTTP Requests4 Iet HittpResponse = set of HTTP Responses5 let MainRequest= set of main HTTP Requests6 let MainResponse = set of main HTTP Responses7 let i k= 1 counter for current HttpMessage8 let LastReferer = empty NewRefere = empty9 HttpRequest=ExtractReg(HITRtraffic)extract HTTP Requests from HTTP traffic
10 HttpRespense - ExtractResp(HITRtraffic)extract HTTP Response from HTTP traffic11 n = extractNumber(HttpRequest) extract total number of HTTP Requests12 m = extractNumber(HttpResponse) extract total number of HTTP Responses13 extract Main HTTP Request from HTTPRequests14 for i 1 n do15 if (i = 1) then16 add MainRequestlarr HttpRequesti First Request is a Main Request17 LastReferer larr Referer of HttpRequesti18 else19 NewReferelarr Referer of HttpRequest120 end if21 if (NewReferer 6= LastReferer) then22 add MainRequestlarr HttpRequestiminus123 LastReferer larr NewRefere24 end if25 end for26 extract Main HTTP Response from HTTPResponse27 for k 1 m do28 if (content-type of HTTPResponsek = texthtml) then29 addMainRespenselarr HttpRespensek30 end if31 end for32 checking last request is main request or not33 if (HTTPResponsemisinMainResponse) then34 addMainRequestlarr HTTPRequestm35 end if36 extract set of web pages37 size = extractNumber(MainRequest) extract total number of Main HTTP Requests38 for j 1 size do39 WebPageslarr (MainRequestj)MainResponsej)40 end for41 return WebRages42 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 71
Table 1 Attribute Vector of a Page in osComerce Web Application
inputs null
Buttons
htmlbodybuttonReviews
htmlbodydivdivdivdivformdivdivspans
panbuttonAdd to Cart
anchors null
image htmlbodydivdivdivaimg
Figure 4 An Example of HTML Code
Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code
For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton
Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors
The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages
In the following these steps are discussed indetail1 Extracting attributes vectors of the page
The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector
WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =
prodni DOM(inputi)
DOMbuttons=prodn
i DOM(buttonsi)DOManchors=
prodni DOM(anchori)
DOMimgs=prodn
i DOM(imgi)
bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button
in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in
the pagebull DOM(img) DOM path of the existing im-
age in the pageSuppose the web application page contains
several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags
2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages
According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-
responding elements in the vector of page2bull All vector elements of page 1 are a subset
of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset
of corresponding elements in the vector ofpage 1
1 httpswwwoscommercecom
72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Figure 5 A Page of osCommerce Application
Algorithm 2 The Pseudocode for Identifying the Similar Pages
INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same
1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then
10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end
bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-
mentSimilar pages are identified according to the
above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages
3 Clustering web application pagesAfter identifying the similar pages they are
put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster
4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI
July 2019 Volume 6 Number 2 (pp 65ndash80) 73
Algorithm 3 The Pseudocode for Clustering the Web Application Pages
INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages
1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi
6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages
10 Ck larr wj
11 end if12 end for13 k + +14 end for15 return C16 end
of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph
In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1
rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages
Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph
Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph
32 Identifying Business Processes in the Ap-plication
To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess
Definition 3 [The Application Process (P)] The
74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph
INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges
1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj
6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj
7 end for8 for i k do9 for j i+ 1 k do
10 if (URIci capReferercj 6= null) then11 E larr E + CiCj
12 end if13 end for14 end for15 return E16 end
Algorithm 5 The Pseudocode for Extracting the User Navigation Graph
INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt
1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck
7 end if8 end for9 return C0 CE
10 end
process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci
Algorithm 6 shows the pseudocode for extractingprocesses
Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there
The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-
fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button
Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions
1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)
2 If the process passes its first node again it means
July 2019 Volume 6 Number 2 (pp 65ndash80) 75
Algorithm 6 The Pseudocode for Extracting Processes
INPUT the web application first node C0
the web application Graph edges EOUTPUT the web application graph process P as a set of web application process
1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0
4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0
5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E
10 end if11 end
the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)
All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process
4 EXPERIMENTAL RESULTS
The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2
The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications
The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input
5 EVALUATION
BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer
We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same
To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria
bull True Positive Samples that fit well into theircorrect clusters
bull False Positive Samples that fit in a cluster thatdo not belong to that cluster
bull False Negative Samples that do not fit in a clusterbut they belong to that cluster
bull Recall It is calculated by the following formula
recall = TruePositiveTruePositive+FalseNegative
bull Precision It is calculated by the following for-mula
76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 2 Testbed Profiles
Web server (test target)
CPU Pentium dual core-220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (BLProM machine)
CPU Intel corei7 220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (legal user)
CPU Pentium dual core i3-3210 GHZ
OS windows 7
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Table 3 Selected Web Applications for Evaluation
Web application Description
TomatoCart-11861 e-commerce
osCommerce-234 e-commerce
WackoPicko Web application for Sharing picture
Algorithm 7 The Pseudocode for Identifying the Application Business Processes
INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process
1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk
9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj
15 end if16 end for17 return BP18 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 77
Table 4 The Clusters of Selected Web Application Pages Evaluation
Criteria
Web applicationWackoPicko Tomatocart osCommerce
samples 89 150 210
clusters 29 66 40
true positive 65 146 205
false positive 24 4 5
false negative 23 3 4
recall 074 098 098
precision 073 097 098
f-measure 073 098 098
Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko
WackoPicko
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 89 89 ndash
Graph Nodes 22 89 752
Graph Edges 48 270 822
process (P) 12 48 75
Average edge in each process (E) 4 46 913
Average edges in all processes (PE) 48 2208 978
business processes 10 NA ndash
precision = TruePositiveTruePositive+FalsePositive
bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision
TruePositive+FalsePositive
The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application
To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in
Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application
Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities
BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
July 2019 Volume 6 Number 2 (pp 65ndash80) 69
Figure 3 Number of Consecutive Requests
should be noted that in responses of the main re-quests all secondary requests exist that lead toload the web pageIdentifyingwhether the last request in the
traffic is the main request or not If the lastresponse in the traffic is the main response thelast HTTP request is the main request as well
For example in Figure 3 requests 1 2 6 and 8are the main requests that are shown in red
The pseudocode of the algorithm used by the BL-ProM to identify web application pages is shownin Algorithm 1 As mentioned before each webapplication page can be indicated by a pair (mainrequest corresponding responses to the main re-quest) The pseudocode in Algorithm 1 can bedivided into four main parts1 Extracting main requests in the traffic (line 13-
26)2 Extracting corresponding responses to the main
requests (line 27-33)3 Identifying whether the last request in the traf-
fic is the main request or not (line 34-37)4 Extracting web application pages (lines 38-43)
In line 9 first all requests in the traffic eitherthe main requests or secondary ones are extractedand put in the HTTPRequest variable In line 10existing responses in the traffic are added to theHTTPResponse variable In lines 11 and 12 thetotal number of requests and responses is calcu-lated In line 15 it is checked whether the currentrequest is the first request if yes it is considered asthe main request and included in the MainRequestvariable In line 21 it is checked whether the Ref-erer field of the current request is different fromthe Referer field of the previous request if yes theprevious request is considered as the main request
In line 29 it is checked whether the content-type field of the current response is texthtml Ifthe given condition is met the current request isconsidered as the main request
In line 35 it is checked whether the last HTTPresponse is the main response if yes the last re-quest is considered as the main request as well
In line 41 all main requests and responses arethe web application pages which are shown as pairs(request response)
3) Clustering the web application pagesIn this step BLProM clusters the web applicationpages Clustering aims to put similar pages in thesame cluster This is helpful to prevent the infinitegrowth of the user navigation graph The pages inthe same cluster are similar to each other
At this stage all pages in the userrsquos stored traffichave been extracted as pairs (main request cor-responding response to the main request) Eachpair (main request corresponding response to themain request) shows one page of the applicationTo extract the optimal user navigation graph thesimilar extracted pages must be identified and clus-tered In the user navigation graph nodes indicatethe applicationrsquos unique pages and edges repre-sent the link between the pages To identify similarpages a criterion should be considered by the pur-pose of the clustering The type of operations thatthe user can perform on the page is considered asa measure for the separation of pages In otherwords two pages are similar if the user can per-form the same operations on them For exampleconsider two pages such that both contain onlya button but the title of the buttons is differentwhere in the first page the title is ldquocontinuerdquo andfor the second one it is ldquosaverdquo These two pagesare different because the user performs differentoperations on them Thus according to the crite-ria specified for the similar pages the followingpages are considered similar in web applicationsbull In the online shop application if the pages
related to the shopping cart of goods even ifthey contain different items they are consid-ered similar pagesbull In the online shop application the profile
pages of each product are similar becausethey have the same HTML structure in termsof the important HTML elementsbull Pages that display search results with different
keywords are similarbull If the structure of pages is a subset of another
page in terms of the important HTML ele-ments these pages are considered similarbull Two pages that both contain user comments
are considered similar even if the contents areabout different products
70 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 1 The Pseudocode for Extracting Web Application Pages From the Traffic
INPUT HITRtrafficHttpMessage1 HttpMessage2 HttpMessages3 HithMessagenOUTPUT WebPages as a set of (Requesti Respensei)
1 Begin2 let JWebPages = set of web pages3 let HttpRequest= Iset of HTTP Requests4 Iet HittpResponse = set of HTTP Responses5 let MainRequest= set of main HTTP Requests6 let MainResponse = set of main HTTP Responses7 let i k= 1 counter for current HttpMessage8 let LastReferer = empty NewRefere = empty9 HttpRequest=ExtractReg(HITRtraffic)extract HTTP Requests from HTTP traffic
10 HttpRespense - ExtractResp(HITRtraffic)extract HTTP Response from HTTP traffic11 n = extractNumber(HttpRequest) extract total number of HTTP Requests12 m = extractNumber(HttpResponse) extract total number of HTTP Responses13 extract Main HTTP Request from HTTPRequests14 for i 1 n do15 if (i = 1) then16 add MainRequestlarr HttpRequesti First Request is a Main Request17 LastReferer larr Referer of HttpRequesti18 else19 NewReferelarr Referer of HttpRequest120 end if21 if (NewReferer 6= LastReferer) then22 add MainRequestlarr HttpRequestiminus123 LastReferer larr NewRefere24 end if25 end for26 extract Main HTTP Response from HTTPResponse27 for k 1 m do28 if (content-type of HTTPResponsek = texthtml) then29 addMainRespenselarr HttpRespensek30 end if31 end for32 checking last request is main request or not33 if (HTTPResponsemisinMainResponse) then34 addMainRequestlarr HTTPRequestm35 end if36 extract set of web pages37 size = extractNumber(MainRequest) extract total number of Main HTTP Requests38 for j 1 size do39 WebPageslarr (MainRequestj)MainResponsej)40 end for41 return WebRages42 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 71
Table 1 Attribute Vector of a Page in osComerce Web Application
inputs null
Buttons
htmlbodybuttonReviews
htmlbodydivdivdivdivformdivdivspans
panbuttonAdd to Cart
anchors null
image htmlbodydivdivdivaimg
Figure 4 An Example of HTML Code
Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code
For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton
Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors
The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages
In the following these steps are discussed indetail1 Extracting attributes vectors of the page
The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector
WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =
prodni DOM(inputi)
DOMbuttons=prodn
i DOM(buttonsi)DOManchors=
prodni DOM(anchori)
DOMimgs=prodn
i DOM(imgi)
bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button
in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in
the pagebull DOM(img) DOM path of the existing im-
age in the pageSuppose the web application page contains
several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags
2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages
According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-
responding elements in the vector of page2bull All vector elements of page 1 are a subset
of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset
of corresponding elements in the vector ofpage 1
1 httpswwwoscommercecom
72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Figure 5 A Page of osCommerce Application
Algorithm 2 The Pseudocode for Identifying the Similar Pages
INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same
1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then
10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end
bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-
mentSimilar pages are identified according to the
above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages
3 Clustering web application pagesAfter identifying the similar pages they are
put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster
4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI
July 2019 Volume 6 Number 2 (pp 65ndash80) 73
Algorithm 3 The Pseudocode for Clustering the Web Application Pages
INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages
1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi
6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages
10 Ck larr wj
11 end if12 end for13 k + +14 end for15 return C16 end
of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph
In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1
rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages
Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph
Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph
32 Identifying Business Processes in the Ap-plication
To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess
Definition 3 [The Application Process (P)] The
74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph
INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges
1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj
6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj
7 end for8 for i k do9 for j i+ 1 k do
10 if (URIci capReferercj 6= null) then11 E larr E + CiCj
12 end if13 end for14 end for15 return E16 end
Algorithm 5 The Pseudocode for Extracting the User Navigation Graph
INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt
1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck
7 end if8 end for9 return C0 CE
10 end
process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci
Algorithm 6 shows the pseudocode for extractingprocesses
Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there
The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-
fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button
Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions
1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)
2 If the process passes its first node again it means
July 2019 Volume 6 Number 2 (pp 65ndash80) 75
Algorithm 6 The Pseudocode for Extracting Processes
INPUT the web application first node C0
the web application Graph edges EOUTPUT the web application graph process P as a set of web application process
1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0
4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0
5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E
10 end if11 end
the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)
All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process
4 EXPERIMENTAL RESULTS
The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2
The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications
The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input
5 EVALUATION
BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer
We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same
To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria
bull True Positive Samples that fit well into theircorrect clusters
bull False Positive Samples that fit in a cluster thatdo not belong to that cluster
bull False Negative Samples that do not fit in a clusterbut they belong to that cluster
bull Recall It is calculated by the following formula
recall = TruePositiveTruePositive+FalseNegative
bull Precision It is calculated by the following for-mula
76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 2 Testbed Profiles
Web server (test target)
CPU Pentium dual core-220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (BLProM machine)
CPU Intel corei7 220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (legal user)
CPU Pentium dual core i3-3210 GHZ
OS windows 7
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Table 3 Selected Web Applications for Evaluation
Web application Description
TomatoCart-11861 e-commerce
osCommerce-234 e-commerce
WackoPicko Web application for Sharing picture
Algorithm 7 The Pseudocode for Identifying the Application Business Processes
INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process
1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk
9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj
15 end if16 end for17 return BP18 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 77
Table 4 The Clusters of Selected Web Application Pages Evaluation
Criteria
Web applicationWackoPicko Tomatocart osCommerce
samples 89 150 210
clusters 29 66 40
true positive 65 146 205
false positive 24 4 5
false negative 23 3 4
recall 074 098 098
precision 073 097 098
f-measure 073 098 098
Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko
WackoPicko
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 89 89 ndash
Graph Nodes 22 89 752
Graph Edges 48 270 822
process (P) 12 48 75
Average edge in each process (E) 4 46 913
Average edges in all processes (PE) 48 2208 978
business processes 10 NA ndash
precision = TruePositiveTruePositive+FalsePositive
bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision
TruePositive+FalsePositive
The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application
To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in
Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application
Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities
BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
70 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 1 The Pseudocode for Extracting Web Application Pages From the Traffic
INPUT HITRtrafficHttpMessage1 HttpMessage2 HttpMessages3 HithMessagenOUTPUT WebPages as a set of (Requesti Respensei)
1 Begin2 let JWebPages = set of web pages3 let HttpRequest= Iset of HTTP Requests4 Iet HittpResponse = set of HTTP Responses5 let MainRequest= set of main HTTP Requests6 let MainResponse = set of main HTTP Responses7 let i k= 1 counter for current HttpMessage8 let LastReferer = empty NewRefere = empty9 HttpRequest=ExtractReg(HITRtraffic)extract HTTP Requests from HTTP traffic
10 HttpRespense - ExtractResp(HITRtraffic)extract HTTP Response from HTTP traffic11 n = extractNumber(HttpRequest) extract total number of HTTP Requests12 m = extractNumber(HttpResponse) extract total number of HTTP Responses13 extract Main HTTP Request from HTTPRequests14 for i 1 n do15 if (i = 1) then16 add MainRequestlarr HttpRequesti First Request is a Main Request17 LastReferer larr Referer of HttpRequesti18 else19 NewReferelarr Referer of HttpRequest120 end if21 if (NewReferer 6= LastReferer) then22 add MainRequestlarr HttpRequestiminus123 LastReferer larr NewRefere24 end if25 end for26 extract Main HTTP Response from HTTPResponse27 for k 1 m do28 if (content-type of HTTPResponsek = texthtml) then29 addMainRespenselarr HttpRespensek30 end if31 end for32 checking last request is main request or not33 if (HTTPResponsemisinMainResponse) then34 addMainRequestlarr HTTPRequestm35 end if36 extract set of web pages37 size = extractNumber(MainRequest) extract total number of Main HTTP Requests38 for j 1 size do39 WebPageslarr (MainRequestj)MainResponsej)40 end for41 return WebRages42 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 71
Table 1 Attribute Vector of a Page in osComerce Web Application
inputs null
Buttons
htmlbodybuttonReviews
htmlbodydivdivdivdivformdivdivspans
panbuttonAdd to Cart
anchors null
image htmlbodydivdivdivaimg
Figure 4 An Example of HTML Code
Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code
For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton
Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors
The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages
In the following these steps are discussed indetail1 Extracting attributes vectors of the page
The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector
WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =
prodni DOM(inputi)
DOMbuttons=prodn
i DOM(buttonsi)DOManchors=
prodni DOM(anchori)
DOMimgs=prodn
i DOM(imgi)
bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button
in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in
the pagebull DOM(img) DOM path of the existing im-
age in the pageSuppose the web application page contains
several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags
2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages
According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-
responding elements in the vector of page2bull All vector elements of page 1 are a subset
of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset
of corresponding elements in the vector ofpage 1
1 httpswwwoscommercecom
72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Figure 5 A Page of osCommerce Application
Algorithm 2 The Pseudocode for Identifying the Similar Pages
INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same
1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then
10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end
bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-
mentSimilar pages are identified according to the
above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages
3 Clustering web application pagesAfter identifying the similar pages they are
put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster
4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI
July 2019 Volume 6 Number 2 (pp 65ndash80) 73
Algorithm 3 The Pseudocode for Clustering the Web Application Pages
INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages
1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi
6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages
10 Ck larr wj
11 end if12 end for13 k + +14 end for15 return C16 end
of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph
In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1
rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages
Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph
Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph
32 Identifying Business Processes in the Ap-plication
To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess
Definition 3 [The Application Process (P)] The
74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph
INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges
1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj
6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj
7 end for8 for i k do9 for j i+ 1 k do
10 if (URIci capReferercj 6= null) then11 E larr E + CiCj
12 end if13 end for14 end for15 return E16 end
Algorithm 5 The Pseudocode for Extracting the User Navigation Graph
INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt
1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck
7 end if8 end for9 return C0 CE
10 end
process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci
Algorithm 6 shows the pseudocode for extractingprocesses
Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there
The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-
fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button
Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions
1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)
2 If the process passes its first node again it means
July 2019 Volume 6 Number 2 (pp 65ndash80) 75
Algorithm 6 The Pseudocode for Extracting Processes
INPUT the web application first node C0
the web application Graph edges EOUTPUT the web application graph process P as a set of web application process
1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0
4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0
5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E
10 end if11 end
the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)
All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process
4 EXPERIMENTAL RESULTS
The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2
The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications
The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input
5 EVALUATION
BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer
We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same
To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria
bull True Positive Samples that fit well into theircorrect clusters
bull False Positive Samples that fit in a cluster thatdo not belong to that cluster
bull False Negative Samples that do not fit in a clusterbut they belong to that cluster
bull Recall It is calculated by the following formula
recall = TruePositiveTruePositive+FalseNegative
bull Precision It is calculated by the following for-mula
76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 2 Testbed Profiles
Web server (test target)
CPU Pentium dual core-220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (BLProM machine)
CPU Intel corei7 220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (legal user)
CPU Pentium dual core i3-3210 GHZ
OS windows 7
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Table 3 Selected Web Applications for Evaluation
Web application Description
TomatoCart-11861 e-commerce
osCommerce-234 e-commerce
WackoPicko Web application for Sharing picture
Algorithm 7 The Pseudocode for Identifying the Application Business Processes
INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process
1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk
9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj
15 end if16 end for17 return BP18 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 77
Table 4 The Clusters of Selected Web Application Pages Evaluation
Criteria
Web applicationWackoPicko Tomatocart osCommerce
samples 89 150 210
clusters 29 66 40
true positive 65 146 205
false positive 24 4 5
false negative 23 3 4
recall 074 098 098
precision 073 097 098
f-measure 073 098 098
Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko
WackoPicko
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 89 89 ndash
Graph Nodes 22 89 752
Graph Edges 48 270 822
process (P) 12 48 75
Average edge in each process (E) 4 46 913
Average edges in all processes (PE) 48 2208 978
business processes 10 NA ndash
precision = TruePositiveTruePositive+FalsePositive
bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision
TruePositive+FalsePositive
The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application
To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in
Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application
Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities
BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
July 2019 Volume 6 Number 2 (pp 65ndash80) 71
Table 1 Attribute Vector of a Page in osComerce Web Application
inputs null
Buttons
htmlbodybuttonReviews
htmlbodydivdivdivdivformdivdivspans
panbuttonAdd to Cart
anchors null
image htmlbodydivdivdivaimg
Figure 4 An Example of HTML Code
Definition of Document Object Model(DOM) path for an HTML element DOMpath of an element is the position of the elementin the HTML code
For example in Figure 4 the DOM path of thebutton (DOMbutton) is DocumentHtmlBodyPButton
Definition 1 [Similar Pages] the similar pagesare those the user can perform the same operationson them and are identical in terms of the positionof the important HTML elements in the page Theimportant HTML elements in the page includebuttons images inputs and anchors
The clustering process includes three steps1 Extracting attributes vectors of the page2 Identifying the subset pages3 Clustering pages
In the following these steps are discussed indetail1 Extracting attributes vectors of the page
The BLProM shows each page of the applicationas a pair (main request response) In this stepBLProM extracts the corresponding attributesvectors of each page by applying a data min-ing operation on the above pair The BLProMmodels each page using the following attributevector
WebPages = the total pages in an applicationforall w ε WebPagesw= (DOMinputs DOMbuttons DOManchorsDOMimgs)DOMinputs =
prodni DOM(inputi)
DOMbuttons=prodn
i DOM(buttonsi)DOManchors=
prodni DOM(anchori)
DOMimgs=prodn
i DOM(imgi)
bull DOM(input) DOM path of lt input gt tagin the page + the value of type attributein the lt input gt tag + the value of nameattribute in the lt input gt tag (in the ab-sence of name attribute the value attributeis considered)bull DOM(button) DOM path of the button
in the page + the title of buttonbull DOM(anchor) DOM path of lt a gt tag in
the pagebull DOM(img) DOM path of the existing im-
age in the pageSuppose the web application page contains
several buttons in this case the second elementof the page attribute vector is a set of the DOMpaths of buttons in the page that are separatedby ldquordquo Figure 5 shows one of the osCommerce1 web application pages Table 1 shows the at-tribute vector of the page in Figure 5 As shownthe input element and the anchor elements arenull it means the page does not contain theabove tags
2 Identifying similar pagesAfter extracting the attribute vector of eachpage it is necessary to identify similar pagesThose pages that their attribute vectors are asubset of another page or have fully similar at-tribute vectors are considered as similar pages
According to Definition 1 the attribute vec-tor of each web application page has four ele-ments The attribute vector of page 1 is consid-ered the same as the attribute vector of page 2ifbull All vector elements of page 1 equal to cor-
responding elements in the vector of page2bull All vector elements of page 1 are a subset
of corresponding elements in the vector ofpage 2bull All vector elements of page 2 are a subset
of corresponding elements in the vector ofpage 1
1 httpswwwoscommercecom
72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Figure 5 A Page of osCommerce Application
Algorithm 2 The Pseudocode for Identifying the Similar Pages
INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same
1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then
10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end
bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-
mentSimilar pages are identified according to the
above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages
3 Clustering web application pagesAfter identifying the similar pages they are
put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster
4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI
July 2019 Volume 6 Number 2 (pp 65ndash80) 73
Algorithm 3 The Pseudocode for Clustering the Web Application Pages
INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages
1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi
6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages
10 Ck larr wj
11 end if12 end for13 k + +14 end for15 return C16 end
of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph
In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1
rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages
Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph
Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph
32 Identifying Business Processes in the Ap-plication
To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess
Definition 3 [The Application Process (P)] The
74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph
INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges
1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj
6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj
7 end for8 for i k do9 for j i+ 1 k do
10 if (URIci capReferercj 6= null) then11 E larr E + CiCj
12 end if13 end for14 end for15 return E16 end
Algorithm 5 The Pseudocode for Extracting the User Navigation Graph
INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt
1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck
7 end if8 end for9 return C0 CE
10 end
process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci
Algorithm 6 shows the pseudocode for extractingprocesses
Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there
The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-
fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button
Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions
1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)
2 If the process passes its first node again it means
July 2019 Volume 6 Number 2 (pp 65ndash80) 75
Algorithm 6 The Pseudocode for Extracting Processes
INPUT the web application first node C0
the web application Graph edges EOUTPUT the web application graph process P as a set of web application process
1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0
4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0
5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E
10 end if11 end
the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)
All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process
4 EXPERIMENTAL RESULTS
The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2
The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications
The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input
5 EVALUATION
BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer
We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same
To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria
bull True Positive Samples that fit well into theircorrect clusters
bull False Positive Samples that fit in a cluster thatdo not belong to that cluster
bull False Negative Samples that do not fit in a clusterbut they belong to that cluster
bull Recall It is calculated by the following formula
recall = TruePositiveTruePositive+FalseNegative
bull Precision It is calculated by the following for-mula
76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 2 Testbed Profiles
Web server (test target)
CPU Pentium dual core-220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (BLProM machine)
CPU Intel corei7 220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (legal user)
CPU Pentium dual core i3-3210 GHZ
OS windows 7
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Table 3 Selected Web Applications for Evaluation
Web application Description
TomatoCart-11861 e-commerce
osCommerce-234 e-commerce
WackoPicko Web application for Sharing picture
Algorithm 7 The Pseudocode for Identifying the Application Business Processes
INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process
1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk
9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj
15 end if16 end for17 return BP18 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 77
Table 4 The Clusters of Selected Web Application Pages Evaluation
Criteria
Web applicationWackoPicko Tomatocart osCommerce
samples 89 150 210
clusters 29 66 40
true positive 65 146 205
false positive 24 4 5
false negative 23 3 4
recall 074 098 098
precision 073 097 098
f-measure 073 098 098
Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko
WackoPicko
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 89 89 ndash
Graph Nodes 22 89 752
Graph Edges 48 270 822
process (P) 12 48 75
Average edge in each process (E) 4 46 913
Average edges in all processes (PE) 48 2208 978
business processes 10 NA ndash
precision = TruePositiveTruePositive+FalsePositive
bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision
TruePositive+FalsePositive
The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application
To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in
Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application
Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities
BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
72 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Figure 5 A Page of osCommerce Application
Algorithm 2 The Pseudocode for Identifying the Similar Pages
INPUT w1 = (DoMw1(input) DoMw1(button) DoMw1(anchor) DoMw1(img))w2 = (DoMw2(input) DoMw2(button) DoMw2(anchor) DoMw2(img)OUTPUT Boolean flag true means two pages are the same
1 Begin2 Let flag input button anchor img=false3 if DoMw1(input) sube DoMw2 (input) or DoMw2(input) sube DoMw1(input) then4 input=true5 end if6 if DoMw1(button) sube DoMw2 (button) or DoMw2(button) sube DoMw1(button) then7 button=true8 end if9 if DoMw1(anchor) sube DoMw2 (anchor) or DoMw2(anchor) sube DoMw1(anchor) then
10 anchor=true11 end if12 if DoMw1(img) sube DoMw2(img) or DoMw2(img) sube DoMw1(img) then13 img=true14 end if15 if input and button and anchor and img then16 flag=true17 end if18 return flag19 end
bull If one or more vector elements of page 1 area subset of their corresponding elements inthe vector of page 2 the rest of the vectorelements of page 1 must be the same withtheir corresponding elements in the vectorof page 2bull The null element is a subset of every ele-
mentSimilar pages are identified according to the
above-mentioned attributes Algorithm 2 illus-trates the pseudocode for identifying similarpages
3 Clustering web application pagesAfter identifying the similar pages they are
put in the same cluster The pages in a clusterare similar to each other and refer to a uniquepage of the application Algorithm 3 shows thepseudocode for clustering web pages In line 7 itis checked whether two pages wi and wj are thesame if yes they are put in the same cluster
4) Extracting user navigation graphIn this step BLProM connects the obtained clus-ters that each one represents a unique web ap-plication page Each cluster has a set of similarpages each of these pages has URI and Refererfield Thus each cluster contains a set of URIs anda set of Referers for the pages in that cluster Itshould be noted that the Referer field is the URI
July 2019 Volume 6 Number 2 (pp 65ndash80) 73
Algorithm 3 The Pseudocode for Clustering the Web Application Pages
INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages
1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi
6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages
10 Ck larr wj
11 end if12 end for13 k + +14 end for15 return C16 end
of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph
In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1
rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages
Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph
Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph
32 Identifying Business Processes in the Ap-plication
To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess
Definition 3 [The Application Process (P)] The
74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph
INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges
1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj
6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj
7 end for8 for i k do9 for j i+ 1 k do
10 if (URIci capReferercj 6= null) then11 E larr E + CiCj
12 end if13 end for14 end for15 return E16 end
Algorithm 5 The Pseudocode for Extracting the User Navigation Graph
INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt
1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck
7 end if8 end for9 return C0 CE
10 end
process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci
Algorithm 6 shows the pseudocode for extractingprocesses
Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there
The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-
fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button
Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions
1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)
2 If the process passes its first node again it means
July 2019 Volume 6 Number 2 (pp 65ndash80) 75
Algorithm 6 The Pseudocode for Extracting Processes
INPUT the web application first node C0
the web application Graph edges EOUTPUT the web application graph process P as a set of web application process
1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0
4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0
5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E
10 end if11 end
the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)
All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process
4 EXPERIMENTAL RESULTS
The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2
The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications
The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input
5 EVALUATION
BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer
We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same
To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria
bull True Positive Samples that fit well into theircorrect clusters
bull False Positive Samples that fit in a cluster thatdo not belong to that cluster
bull False Negative Samples that do not fit in a clusterbut they belong to that cluster
bull Recall It is calculated by the following formula
recall = TruePositiveTruePositive+FalseNegative
bull Precision It is calculated by the following for-mula
76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 2 Testbed Profiles
Web server (test target)
CPU Pentium dual core-220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (BLProM machine)
CPU Intel corei7 220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (legal user)
CPU Pentium dual core i3-3210 GHZ
OS windows 7
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Table 3 Selected Web Applications for Evaluation
Web application Description
TomatoCart-11861 e-commerce
osCommerce-234 e-commerce
WackoPicko Web application for Sharing picture
Algorithm 7 The Pseudocode for Identifying the Application Business Processes
INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process
1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk
9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj
15 end if16 end for17 return BP18 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 77
Table 4 The Clusters of Selected Web Application Pages Evaluation
Criteria
Web applicationWackoPicko Tomatocart osCommerce
samples 89 150 210
clusters 29 66 40
true positive 65 146 205
false positive 24 4 5
false negative 23 3 4
recall 074 098 098
precision 073 097 098
f-measure 073 098 098
Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko
WackoPicko
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 89 89 ndash
Graph Nodes 22 89 752
Graph Edges 48 270 822
process (P) 12 48 75
Average edge in each process (E) 4 46 913
Average edges in all processes (PE) 48 2208 978
business processes 10 NA ndash
precision = TruePositiveTruePositive+FalsePositive
bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision
TruePositive+FalsePositive
The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application
To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in
Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application
Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities
BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
July 2019 Volume 6 Number 2 (pp 65ndash80) 73
Algorithm 3 The Pseudocode for Clustering the Web Application Pages
INPUT WebPages = w1 w2 w3 wnOUTPUT the web application model M as a set of web page clusters Cthe web page clusters C as a set of web pages
1 Begin2 Let M = empty set of page clusters3 Let k=1 number of web page clusters4 for i 1 n do5 Ck larr wi
6 for j = i+ 1 n do7 if SimilarWebPages(wi wj) then8 WebPageslarrWebPagesminus wi9 n= length of WebPages
10 Ck larr wj
11 end if12 end for13 k + +14 end for15 return C16 end
of the previous web application page that the uservisited For extracting the edges of user navigationgraph the produced clusters are checked to findwhich Referer set of clusters has the intersectionwith the URI set of the cluster When the cluster isfound these two clusters are connected Supposethat the URI set of cluster C1 has an intersectionwith the Referer set of cluster C2 then the pathfrom cluster C1 to cluster C2 (C1rarr C2) is createdIn other words the edge C1C2 is one of the edges inthe user navigation graph Algorithm 4 shows thepseudocode for extracting the edges from the usernavigation graph The URI set and the Referer setof each cluster are respectively obtained accordingto lines 5 and 6 of the pseudocode in Algorithm 3In line 10 if the intersection of the URI set of eachcluster with other clustersrsquo Referer set is not nullthe CiCj edge is added to the edge set of the graph
In this step BLProM connects the obtainedclusters that each one represents a unique web ap-plication page Each cluster actually has a set ofsimilar pages each of these pages has URI andReferer field Thus each cluster contains a set ofURIs and a set of Referers for pages in the clusterIt should be noted that the Referer field is actu-ally the URI of the previous web application pagethat the user visited For extracting the edges ofuser navigation graph the produced clusters arechecked to find which Referer sets of clusters hasthe intersection with the URI set of the clusterwhen the cluster is found these two clusters areconnected Suppose that the URI set of cluster C1has intersection with the Referer set of cluster C2then the path from cluster C1 to the cluster C2 (C1
rarr C2) is created In other words the edge C1C2is one of the edges in the user navigation graphAlgorithm 4 shows the pseudocode for extractingthe edges from the user navigation graph The URIset and the Referer set of each cluster are respec-tively obtained according to the lines 5 and 6 ofthe pseudocode in Algorithm 3 In line 10 if theintersection of URI set of each cluster with otherclustersrsquo Referer set is not null CiCj edge is addedto the edge set of the graph The user navigationgraph is created according to the algorithm in Al-gorithm 5 where nodes are the clusters set and theidentified edges are the path between clusters Inthe created graph each node indicates the uniquepage and the edges show the path between pages
Definition 2 [The User Navigation Graph] Thisgraph is shown with tupleltC0 C Egt where C isthe set of nodes in the graph C0 isin C is the first(initial) node in the graph and E sube C times C is theset of edges in the graph
Algorithm 5 shows the pseudocode for extractingthe user navigation graph Line 7 indicates a clusterthat contains the initial page of the application Itis considered as the initial node of the graph
32 Identifying Business Processes in the Ap-plication
To identify the web application business processes inthe user navigation graph it is necessary to define theprocess and final node and then define the businessprocess
Definition 3 [The Application Process (P)] The
74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph
INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges
1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj
6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj
7 end for8 for i k do9 for j i+ 1 k do
10 if (URIci capReferercj 6= null) then11 E larr E + CiCj
12 end if13 end for14 end for15 return E16 end
Algorithm 5 The Pseudocode for Extracting the User Navigation Graph
INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt
1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck
7 end if8 end for9 return C0 CE
10 end
process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci
Algorithm 6 shows the pseudocode for extractingprocesses
Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there
The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-
fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button
Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions
1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)
2 If the process passes its first node again it means
July 2019 Volume 6 Number 2 (pp 65ndash80) 75
Algorithm 6 The Pseudocode for Extracting Processes
INPUT the web application first node C0
the web application Graph edges EOUTPUT the web application graph process P as a set of web application process
1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0
4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0
5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E
10 end if11 end
the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)
All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process
4 EXPERIMENTAL RESULTS
The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2
The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications
The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input
5 EVALUATION
BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer
We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same
To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria
bull True Positive Samples that fit well into theircorrect clusters
bull False Positive Samples that fit in a cluster thatdo not belong to that cluster
bull False Negative Samples that do not fit in a clusterbut they belong to that cluster
bull Recall It is calculated by the following formula
recall = TruePositiveTruePositive+FalseNegative
bull Precision It is calculated by the following for-mula
76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 2 Testbed Profiles
Web server (test target)
CPU Pentium dual core-220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (BLProM machine)
CPU Intel corei7 220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (legal user)
CPU Pentium dual core i3-3210 GHZ
OS windows 7
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Table 3 Selected Web Applications for Evaluation
Web application Description
TomatoCart-11861 e-commerce
osCommerce-234 e-commerce
WackoPicko Web application for Sharing picture
Algorithm 7 The Pseudocode for Identifying the Application Business Processes
INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process
1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk
9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj
15 end if16 end for17 return BP18 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 77
Table 4 The Clusters of Selected Web Application Pages Evaluation
Criteria
Web applicationWackoPicko Tomatocart osCommerce
samples 89 150 210
clusters 29 66 40
true positive 65 146 205
false positive 24 4 5
false negative 23 3 4
recall 074 098 098
precision 073 097 098
f-measure 073 098 098
Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko
WackoPicko
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 89 89 ndash
Graph Nodes 22 89 752
Graph Edges 48 270 822
process (P) 12 48 75
Average edge in each process (E) 4 46 913
Average edges in all processes (PE) 48 2208 978
business processes 10 NA ndash
precision = TruePositiveTruePositive+FalsePositive
bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision
TruePositive+FalsePositive
The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application
To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in
Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application
Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities
BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
74 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Algorithm 4 The Pseudocode for Extracting Edges From the User Navigation Graph
INPUT C = C1 C2 C3 Ck web pages clusters as graph nodesWebPages= w1 w2 w3 wn set of web pagesOUTPUT the web application graph edges E as a set of edges
1 Begin2 Let E = empty set of web application graph Edges3 for j 1 k do4 Let URIcj Referercj = empty5 URIcj = URIcjcup ExtractURI (w) for any w isin Cj
6 Refererck = Refererck cup ExtractReferer(w) for any w isin Cj
7 end for8 for i k do9 for j i+ 1 k do
10 if (URIci capReferercj 6= null) then11 E larr E + CiCj
12 end if13 end for14 end for15 return E16 end
Algorithm 5 The Pseudocode for Extracting the User Navigation Graph
INPUT C= C1 C2 C3 Ck web pages clusters as graph nodesw1 First web pageOUTPUT the web application navigation graph lt C0 CE gt
1 Begin2 Let C0 = empty3 E=ExtractGraphEdges set of web application graph Edges4 for j 1 k do5 if Ck contains w1 then6 C0 = Co + Ck
7 end if8 end for9 return C0 CE
10 end
process P in the application is a sequence ofnodes and edges in the user navigation graph likeE1 E2 Ek whereEi isin E andEi = Ciminus1Ci
Algorithm 6 shows the pseudocode for extractingprocesses
Definition 4 [the final nodes in the user navigationgraph (F)] The final nodes refer to the completion ofa business process that occurs when the applicationreaches there
The final nodes can be detected by examining theHTTP responses For example in the process of buy-ing a product a phrase like rdquoThank you for your pur-chaserdquo is displayed after completion of the purchaseBy specifying a set of these phrases and searchingthem in the responses the final nodes can be identi-
fied Some keywords used for identifying the final nodeare Thank Congratulations Successfully Log Off andSearch Results Additionally some buttons in the webapplication page are good signs that help the identifi-cation of the final node in the graph The examples offinal nodes include the page after clicking the ldquosaverdquobutton the page after clicking the ldquocreationrdquo buttonand the page after clicking the ldquosubmitrdquo button
Definition 5 [the application business process] Thebusiness process BP in the application is a processthat has at least one of the following conditions
1 The first node of the process is the initial nodein the user navigation graph (C0) and the processend node is the final node in the user navigationgraph (F)
2 If the process passes its first node again it means
July 2019 Volume 6 Number 2 (pp 65ndash80) 75
Algorithm 6 The Pseudocode for Extracting Processes
INPUT the web application first node C0
the web application Graph edges EOUTPUT the web application graph process P as a set of web application process
1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0
4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0
5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E
10 end if11 end
the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)
All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process
4 EXPERIMENTAL RESULTS
The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2
The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications
The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input
5 EVALUATION
BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer
We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same
To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria
bull True Positive Samples that fit well into theircorrect clusters
bull False Positive Samples that fit in a cluster thatdo not belong to that cluster
bull False Negative Samples that do not fit in a clusterbut they belong to that cluster
bull Recall It is calculated by the following formula
recall = TruePositiveTruePositive+FalseNegative
bull Precision It is calculated by the following for-mula
76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 2 Testbed Profiles
Web server (test target)
CPU Pentium dual core-220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (BLProM machine)
CPU Intel corei7 220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (legal user)
CPU Pentium dual core i3-3210 GHZ
OS windows 7
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Table 3 Selected Web Applications for Evaluation
Web application Description
TomatoCart-11861 e-commerce
osCommerce-234 e-commerce
WackoPicko Web application for Sharing picture
Algorithm 7 The Pseudocode for Identifying the Application Business Processes
INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process
1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk
9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj
15 end if16 end for17 return BP18 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 77
Table 4 The Clusters of Selected Web Application Pages Evaluation
Criteria
Web applicationWackoPicko Tomatocart osCommerce
samples 89 150 210
clusters 29 66 40
true positive 65 146 205
false positive 24 4 5
false negative 23 3 4
recall 074 098 098
precision 073 097 098
f-measure 073 098 098
Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko
WackoPicko
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 89 89 ndash
Graph Nodes 22 89 752
Graph Edges 48 270 822
process (P) 12 48 75
Average edge in each process (E) 4 46 913
Average edges in all processes (PE) 48 2208 978
business processes 10 NA ndash
precision = TruePositiveTruePositive+FalsePositive
bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision
TruePositive+FalsePositive
The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application
To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in
Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application
Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities
BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
July 2019 Volume 6 Number 2 (pp 65ndash80) 75
Algorithm 6 The Pseudocode for Extracting Processes
INPUT the web application first node C0
the web application Graph edges EOUTPUT the web application graph process P as a set of web application process
1 Begin2 Let P0 = empty set of web application processes3 Let StartEdge= empty set of graph edge that begin from C0
4 StartEdge= ExtractGraphEdges(C0) Extract all edges from C0
5 EndPoint = ExtractEndPoint(StartEdge) Extract the end point of edges6 if (ExtractProcess(EndPoint E) 6= null) then7 return P=E+ExteractProcess(Endpoint E) for any edge E isin StartEdge8 else9 return E
10 end if11 end
the first node and the end node of the process arethe same and the process length is greater thantwo (If there is a return to the passed node in theprocess and the created loop length is more thantwo this is a business process)
All processes in the graph from the initial node(the application initial page) to the identified finalnodes as well as the processes of their initial nodeand the final node are the same and all of themare the application business processes Algorithm 7shows the pseudocode for identifying the applicationbusiness process In line 4 the application businessprocesses are extracted In line 5 the final nodes ofthe graph are extracted In line 7 the processes thatstart with the initial node and end with the final nodeare identified as the business process are stored in thevariable BP In line 9 the processes with repeatednodes are detected and in line 11 among the detectedprocesses if their initial node and their final node arethe same and their length is greater than two theyare added to the variable BP as the business process
4 EXPERIMENTAL RESULTS
The testbed used in this section is a network consistingof a web server (test target) and two clients (BLProMsystem and legal user) The web server and the clientsare loaded on a virtual machine The web server andthe clientsrsquo profiles are shown in Table 2
The web applications listed in Table 3 are installedon the web server (test target) and then we plan toidentify the business layer of the web applications
The legal user first starts using the selected webapplications The user crawls all permitted parts ofthe web application HTTP traffic of the legal user isgiven to BLProM as its input
5 EVALUATION
BLProMrsquos goal is to identify the business layer of theweb application We can identify business logic vul-nerability by identifying the business layer of the webapplication BLProM detects the business processesof the web application Identifying business processesis the main step in dynamic security testing of theweb application in the business layer
We compare BLProM with OWASP ZAP TheOWASP Zed Attack Proxy (ZAP) is a free web scan-ner It scans web applications and automatically findssome security vulnerabilities ZAP is the only free webscanner that has API for extracting web applicationgraph The web application pages are graph nodes andthe relations among pages are shown as graph edgesThe main difference between BLPRoM and ZAP isin detecting similar pages ZAP cannot detect similarpages in the web application but BLPRoM can ZAPrsquosgraph only shows the relation among scanned pagesbut BLProM generates the optimal graph About theaccuracy of the generated graph both BLProM andZAP are the same
To evaluate the proposed approach we first showthe accuracy of clustering by the following criteria
bull True Positive Samples that fit well into theircorrect clusters
bull False Positive Samples that fit in a cluster thatdo not belong to that cluster
bull False Negative Samples that do not fit in a clusterbut they belong to that cluster
bull Recall It is calculated by the following formula
recall = TruePositiveTruePositive+FalseNegative
bull Precision It is calculated by the following for-mula
76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 2 Testbed Profiles
Web server (test target)
CPU Pentium dual core-220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (BLProM machine)
CPU Intel corei7 220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (legal user)
CPU Pentium dual core i3-3210 GHZ
OS windows 7
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Table 3 Selected Web Applications for Evaluation
Web application Description
TomatoCart-11861 e-commerce
osCommerce-234 e-commerce
WackoPicko Web application for Sharing picture
Algorithm 7 The Pseudocode for Identifying the Application Business Processes
INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process
1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk
9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj
15 end if16 end for17 return BP18 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 77
Table 4 The Clusters of Selected Web Application Pages Evaluation
Criteria
Web applicationWackoPicko Tomatocart osCommerce
samples 89 150 210
clusters 29 66 40
true positive 65 146 205
false positive 24 4 5
false negative 23 3 4
recall 074 098 098
precision 073 097 098
f-measure 073 098 098
Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko
WackoPicko
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 89 89 ndash
Graph Nodes 22 89 752
Graph Edges 48 270 822
process (P) 12 48 75
Average edge in each process (E) 4 46 913
Average edges in all processes (PE) 48 2208 978
business processes 10 NA ndash
precision = TruePositiveTruePositive+FalsePositive
bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision
TruePositive+FalsePositive
The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application
To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in
Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application
Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities
BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
76 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 2 Testbed Profiles
Web server (test target)
CPU Pentium dual core-220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (BLProM machine)
CPU Intel corei7 220 GHZ
OS windows 81
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Client (legal user)
CPU Pentium dual core i3-3210 GHZ
OS windows 7
VMware cpu 1GHZ
VMware RAM 1G
VMware OS windows 7
Table 3 Selected Web Applications for Evaluation
Web application Description
TomatoCart-11861 e-commerce
osCommerce-234 e-commerce
WackoPicko Web application for Sharing picture
Algorithm 7 The Pseudocode for Identifying the Application Business Processes
INPUT the web application navigation graph lt C0 CE gtthe web application Graph edges EOUTPUT the web application graph business process PB as a set of web application process
1 Begin2 Let P = empty set of web application processes3 Let F = empty set of web application final nodes4 P= ExtractProcess Extract processes in the web application5 F= ExtractFinalNodes Extract Final nodes in the web application6 for i 1 k do7 if Pk start by C0 and ends by F then8 BP = BP + Pk
9 end if10 end for11 R=ExtractProcessWithRepeatedNodes(P) Extract Process with Repeated Nodes12 for j 1 m do13 if Rj has same initial and end node and length(Rj) =2 then14 BP = BP +Rj
15 end if16 end for17 return BP18 end
July 2019 Volume 6 Number 2 (pp 65ndash80) 77
Table 4 The Clusters of Selected Web Application Pages Evaluation
Criteria
Web applicationWackoPicko Tomatocart osCommerce
samples 89 150 210
clusters 29 66 40
true positive 65 146 205
false positive 24 4 5
false negative 23 3 4
recall 074 098 098
precision 073 097 098
f-measure 073 098 098
Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko
WackoPicko
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 89 89 ndash
Graph Nodes 22 89 752
Graph Edges 48 270 822
process (P) 12 48 75
Average edge in each process (E) 4 46 913
Average edges in all processes (PE) 48 2208 978
business processes 10 NA ndash
precision = TruePositiveTruePositive+FalsePositive
bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision
TruePositive+FalsePositive
The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application
To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in
Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application
Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities
BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
July 2019 Volume 6 Number 2 (pp 65ndash80) 77
Table 4 The Clusters of Selected Web Application Pages Evaluation
Criteria
Web applicationWackoPicko Tomatocart osCommerce
samples 89 150 210
clusters 29 66 40
true positive 65 146 205
false positive 24 4 5
false negative 23 3 4
recall 074 098 098
precision 073 097 098
f-measure 073 098 098
Table 5 Comparing the Proposed Approach With OWASP ZAP in Scanning WackoPicko
WackoPicko
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 89 89 ndash
Graph Nodes 22 89 752
Graph Edges 48 270 822
process (P) 12 48 75
Average edge in each process (E) 4 46 913
Average edges in all processes (PE) 48 2208 978
business processes 10 NA ndash
precision = TruePositiveTruePositive+FalsePositive
bull F-Measure It is calculated by the following for-mulaF minusMeasure = 2lowastrecalllowastprecision
TruePositive+FalsePositive
The value of these criteria for the ClusteringWeb-Pages algorithm (Algorithm 3) is shown in Table 4In the first row samples shows the total numberof web pages in HTTP traffic extracted from the Ex-tractWebPages algorithm in Algorithm 1 In the sec-ond row clusters shows the total number of clustersextracted from the ClustreringWebPages algorithmin Algorithm 3 In the next rows the criteria listedabove are calculated for each web application
To evaluate the proposed approach we comparethe output of BLProM with the scanning output ofOWASP ZAP for selected web applications BLProMoutput for WackoPicko is shown in Table 5 graphnodes shows the total number of clusters extractedfrom the ClustreringWebPages algorithm in Algo-rithm 3 graph edges is the total number of edgesextracted from the ExtractGraphEdges algorithm in
Algorithm 4 process is the total number of pathsfrom the first node extracted from the ExtractPro-cess algorithm in Algorithm 6 Average edge in eachprocess (E) is calculated by the sum of edges of eachprocess divided by the total number of processes Av-erage edge in all process is calculated by the productof the total number of processes (P) in average edgesof each process (E) business processes is the totalnumber of business processes in the web application
Existing values in the first column of Table 5 arethe output obtained from the proposed approach Thevalues in the second column are the scanning out-put of ZAP The third column shows the percentageof scanning improvement of our proposed approachcompared to the ZAP scan The results of the tableindicate that the ZAP scan is a non-smart scan As aresult of this non-intelligent scan ZAP is not able toidentify business layer vulnerabilities
BLPRoM output for osCommerce is shown in Ta-ble 6 The third column shows the percentage of scan-ning improvement compared to ZAP scanning It isobserved that web application scanning is improvedby identifying the web application business layer
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
78 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Table 6 Comparing the Proposed Approach With OWASP ZAP in Scanning osCommerce
osCommerce
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 170 170 ndash
Graph Nodes 40 170 764
Graph Edges 66 379 825
process (P) 23 17 26
Average edge in each process (E) 3 113 973
Average edges in all processes (PE) 69 1921 964
business processes 18 NA ndash
Table 7 Comparing the Proposed Approach With OWASP ZAP in Scanning TomatoCart
TomatoCart
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
HTTP Message 150 150 ndash
Graph Nodes 66 150 56
Graph Edges 87 410 787
process (P) 31 39 205
Average edge in each process (E) 4 101 96
Average edges in all processes (PE) 156 3131 95
business processes 30 NA ndash
BLPRoM output for TomatoCart is shown in Ta-ble 7 The third column shows the percentage of scan-ning improvement of our proposed approach comparedto ZAP scanning
Table 8 shows the average of the proposed approachthe average of OWASP ZAP and the average percent-age of improvement in the scanning of selected webapplications For example in the rdquoAverage edges inall processesrdquo benchmark our approach has been im-proved by about 96 percent compared to OWASPZAP
According to the results presented in this table canbe observed that BLProM has improved web applica-tion scanning BLProM is aware of web applicationbusiness processes By identifying the web applicationbusiness layer web scanners can detect business layervulnerabilities
6 CONCLUSION
Business logic vulnerabilities are strong vulnerabili-ties that compromise web application security Web
scanners cannot detect business logic vulnerabilitiesbecause they are unable to understand business logicof the web application For detecting business logicvulnerabilities the business logic of the web applica-tion needs to be understood Therefore these vulner-abilities are specific to the application and difficult toidentify
In this paper we proposed BLProM a black-boxapproach for detecting business processes of the webapplication BLProM aims to ease detecting businesslogic vulnerabilities BLPRoM output is used as inputin dynamic security testing of the web applicationsin the business layer in order to detect business logicvulnerabilities BLProM consists of two main steps
1- extracting user navigation graph2- Detecting web application business processes
At the lab we scanned three web applications byBLProM and OWASP ZAP an open-source web appli-cation We showed that BLProM improved scanningabout 96 compared to OWASP ZAP BLProM im-proved the scanning of web applications because itclusters web pages and prevents scanning similar web
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
July 2019 Volume 6 Number 2 (pp 65ndash80) 79
Table 8 Comparison of the Average of the Proposed Approach and the Average of ZAP Approach in the Scanning of Selected
Web Applications
Average of selected web application
Criteria
ApproachesProposed approach OWASP ZAP Percentage of improvement
compared to OWASP ZAP
Graph Nodes 426 1363 692
Graph Edges 67 353 811
process (P) 20 366 405
Average edge in each process (E) 36 866 948
Average edges in all processes (PE) 91 2420 964
application pages
References
[1] Common vulnerabilities and exposures httpscvemitreorgcvecvehtml
[2] 2015 ITRC Identity Theft Resource Cen-ter Breach Report Hits Near Record Highin 2015 httpwwwidtheftcenterorg
ITRC-Surveys-Studies2015databreaches
html Accessed Feb 2017[3] G Pellegrino and D Balzarotti Toward Black-
Box Detection of Logic Flaws in Web Appli-cations In Network and Distributed SystemSecurity symposium 2014 (NDSS2014) 2014doi1014722ndss201423021
[4] Testing for business logic OWASP https
wwwowasporgindexphpTesting_for_
business_logic Accessed Feb 2017[5] D Balzarotti M Cova V Felmetsger and G Vi-
gna Multi-module vulnerability analysis ofweb-based applications In Proceedings of the14th ACM conference on Computer and com-munications security page 25ndash35 ACM 2007doi10114513152451315250
[6] A Doupe B Boe C Kruegel and G VignaFear the EAR discovering and mitigating execu-tion after redirect vulnerabilities In Proceedingsof the 18th ACM conference on Computer andcommunications security page 251ndash262 ACM2011 doi10114520467072046736
[7] V Felmetsger L Cavedon C Kruegel and G Vi-gna Business Logic Attacks ndash Bots and BATsOWASP 2009 In USENIX Security Symposium2010
[8] E Chai Business Logic Attacks ndash Bots andBATs OWASP 2009 httpswwwowasp
orgimagesaaaOWASP_Cincinnati_Jan_
2011pdf Accessed Feb 2017[9] X Li and Y Xue BLOCK a black-box
approach for detection of state violation at-
tacks towards web applications In Proceed-ings of the 27th Annual Computer SecurityApplications Conference page 247ndash256 2011doi10114520767322076767
[10] M Cova D Balzarotti V Felmetsger and G Vi-gna Swaddler An Approach for the Anomaly-Based Detection of State Violations in Web Ap-plications In Giovanni pages 63ndash86 SpringerBerlin Heidelberg 2007 ISBN 978-3-540-74319-4 doi101007978-3-540-74320-0 4
[11] X Li W Yan and Y Xue SENTINEL securingdatabase from logic flaws in web applicationsIn Proceedings of the second ACM conference onData and Application Security and Privacy page25ndash36 2012 doi10114521336012133605
[12] M Alidoosti and A Nowroozi BLDASTbusiness-layer Dynamic Application SecurityTester of the web application in order to detectweb application vulnerabilities against floodingDoS attacks In Iran Society of Cryptology Con-ference shiraz Iran in Persian 2017
[13] M Alidoosti A Nowroozi and A NickabadiEvaluating the Web-Application Resiliency toBusiness-Layer DoS Attacks ETRI Journal 2019doi104218etrij2019-0164
[14] M Alidoosti and A Nowroozi BLTOCTTOUbusiness-layer dynamic application security testerof the web application in order to detect web ap-plication vulnerabilities against Race Conditionattacks In Computer Society of Iran ConferenceTehran Iran in Persian 2018
[15] M Alidoosti and A Nowroozi BL-ProM Business-layer process miner ofthe web application In ISCISC pages 1ndash6 IEEE 2018 ISBN 978-1-5386-7582-3doi101109ISCISC20188546899
[16] V Crescenzi P Merialdo and P Missier Clus-tering Web pages based on their structure Dataamp Knowledge Engineering 54(3)279ndash299 2005doi101016jdatak200411004
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing
80 BLProM A black-box approach for detecting business-layer mdashM Alidoosti A Nowroozi et al
Mitra Alidoosti received her BS and MSc
degrees in computer engineering from the De-partment of Computer Engineering Iran Uni-
versity of Science and Technology Tehran
Iran in 2009 and 2012 respectively Currentlyshe is working toward the PhD degree in
computer engineering at Malek-e-Ashtar Uni-versity of Technology Tehran Iran Her re-
search interests are computer network security VoIP and SIP
security and web-application security
Alireza Nowroozi is a freelance consultantwho advises government and private-sector-
related industries on information technologyHe has four-year experience as an academicstaff member and an IT post-doctoral position
with Sharif University of Technology TehranIran He is a specialist in artificial intelligencecognitive science software engineering and
IT security Besides he is a co-founder of four IT startups
Ahmad Nickabadi received his BS degreein computer engineering in 2004 and the
MSc and PhD degrees in artificial intelli-
gence in 2006 and 2011 respectively fromAmirkabir University of Technology TehranIran He is currently an Assistant Professor
in the Department of Computer EngineeringAmirkabir University of Technology His re-
search interests include statistical machine learning and softcomputing