+ All Categories
Home > Documents > ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2...

( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2...

Date post: 12-Jun-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
17
US010346291B2 ( 12 ) United States Patent Ben - Bassat et al . ( 10 ) Patent No . : US 10 , 346 , 291 B2 ( 45 ) Date of Patent : Jul . 9, 2019 ( 54 ) TESTING WEB APPLICATIONS USING CLUSTERS ( 71 ) Applicant : International Business Machines Corporation , Armonk , NY ( US ) 8 , 136 , 025 B1 * 3/ 2012 Zhu . . .. . . .. ... . ..... . G06F 17 / 30864 707 / 634 8 , 140 , 505 B1 3 / 2012 Jain et al . 8 , 738 , 749 B2 * 5 / 2014 Brock .. .. .. .. .. .. .. .. .. ... G06F 21 / 10 709 / 223 9 , 734 , 149 B2 * 8/ 2017 Barouni Ebrahimi . G06F 17 / 30011 2003 / 0061490 A1 * 3/ 2003 Abajian . . . . .. . .. GO6F 17 / 30038 713 / 176 ( Continued ) GOST ( 72 ) Inventors : Ilan Ben - Bassat , Haifa ( IL ); Daniel Dubnikov , Savyon ( IL ); Sagi Kedmi , Raanana ( IL ); Erez Rokah , Tel Aviv ( IL ) FOREIGN PATENT DOCUMENTS WO 2013009713 A2 1/ 2013 ( 73 ) Assignee : International Business Machines Corporation , Armonk , NY ( US ) (* ) Notice : Subject to any disclaimer , the term of this patent is extended or adjusted under 35 U . S .C . 154 (b ) by 2 days . OTHER PUBLICATIONS ( 21 ) Appl . No .: 15 / 438 , 527 ( 22 ) Filed : Feb . 21, 2017 Andrea Stocco , et al .; " Clustering - Aided Page Object Generation for Web Testing ” ; Jun . 2016 , 19 pages . https :/ / www . researchgate . net / publication / 296520249 _ Clustering - Aided _ Page _ Object _ Generation _ for _ Web _ Testing . ( Continued ) ( 65 ) Prior Publication Data US 2018 / 0239693 A1 Aug . 23 , 2018 Primary Examiner Insun Kang ( 74 ) Attorney , Agent , or Firm Alexander G . Jochym ( 51 ) ( 57 ) ABSTRACT ( 52 ) Int . CI . GO6F 9 / 44 ( 2018 . 01 ) G06F 11 / 36 ( 2006 . 01 ) H04L 29 / 08 ( 2006 . 01 ) G06F 16 / 951 ( 2019 . 01 ) U .S . CI . CPC . . . .. GO6F 11 / 3676 ( 2013 . 01 ); G06F 11 / 3672 ( 2013 . 01 ); G06F 16 / 951 ( 2019 . 01 ); H04L 67 / 02 ( 2013 . 01 ) Field of Classification Search None See application file for complete search history . ( 58 ) An example system includes a processor to crawl a plurality of web pages of a web application to be tested . The proces sor is also configured to receive an intercepted input to the web application and an output from a web application associated with each crawled web page . The processor is to further configured to detect testable elements in the inter cepted input and the output . The processor is also configured to generate a fingerprint for each web page based on the detected testable elements . The processor is also configured to generate a list of clusters comprising one or more similar web pages based on the fingerprints . The processor is configured to test a single web page from each cluster . ( 56 ) References Cited U .S . PATENT DOCUMENTS 7 , 627 , 613 B1 * 12 / 2009 Dulitz .. .. ... ... .. .. . G06F 17 / 30864 7 , 676 , 465 B2 3/ 2010 Poola 14 Claims , 6 Drawing Sheets Crawl Plurality of Web Pages of a Web Application to be Tested r 2012 200 . 00 . 000 . . . 000 . 000 . 000 . 000 . 00 . .. 0 .. . 000 . 000 00 . 000 ... . 0 . . .. 0 . . . 000 . . .. 000 . 000 . 000 . 000 . 00 . . . . 0 .. . 000 . 00 . 00 ... . 000 .. . 000 ... 100 . 000 . 000 . 00 . . . 0 .. . 000 . 00 ... 200 . 0 204 Receive Intercepted input to Web Server Application and Output from Web Application Associated with Each Crawled Web Page ?????????????????????????????????????????????????????? ????? ????? someonemconsomsomosome concorrenconsommanomenoncommon commandemenrecomendamomomme 206 Detect Testable Elements in intercepted input and Output ( 208 Generate Fingerprint for Each Web Page Based on Detected Testable Elements wwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwww Generate List of Clusters Including One or More Similar Web Pages Based on Fingerprints 210 Test Single Web Page from Each Cluster 7 - 212
Transcript
Page 1: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

US010346291B2

( 12 ) United States Patent Ben - Bassat et al .

( 10 ) Patent No . : US 10 , 346 , 291 B2 ( 45 ) Date of Patent : Jul . 9 , 2019

( 54 ) TESTING WEB APPLICATIONS USING CLUSTERS

( 71 ) Applicant : International Business Machines Corporation , Armonk , NY ( US )

8 , 136 , 025 B1 * 3 / 2012 Zhu . . . . . . . . . . . . . . . . . . G06F 17 / 30864 707 / 634

8 , 140 , 505 B1 3 / 2012 Jain et al . 8 , 738 , 749 B2 * 5 / 2014 Brock . . . . . . . . . . . . . . . . . . . . . G06F 21 / 10

709 / 223 9 , 734 , 149 B2 * 8 / 2017 Barouni Ebrahimi .

G06F 17 / 30011 2003 / 0061490 A1 * 3 / 2003 Abajian . . . . . . . . . GO6F 17 / 30038

713 / 176 ( Continued )

GOST ( 72 ) Inventors : Ilan Ben - Bassat , Haifa ( IL ) ; Daniel

Dubnikov , Savyon ( IL ) ; Sagi Kedmi , Raanana ( IL ) ; Erez Rokah , Tel Aviv ( IL )

FOREIGN PATENT DOCUMENTS WO 2013009713 A2 1 / 2013

( 73 ) Assignee : International Business Machines Corporation , Armonk , NY ( US )

( * ) Notice : Subject to any disclaimer , the term of this patent is extended or adjusted under 35 U . S . C . 154 ( b ) by 2 days .

OTHER PUBLICATIONS

( 21 ) Appl . No . : 15 / 438 , 527 ( 22 ) Filed : Feb . 21 , 2017

Andrea Stocco , et al . ; " Clustering - Aided Page Object Generation for Web Testing ” ; Jun . 2016 , 19 pages . https : / / www . researchgate . net / publication / 296520249 _ Clustering - Aided _ Page _ Object _ Generation _ for _ Web _ Testing .

( Continued ) ( 65 ) Prior Publication Data

US 2018 / 0239693 A1 Aug . 23 , 2018 Primary Examiner — Insun Kang ( 74 ) Attorney , Agent , or Firm — Alexander G . Jochym ( 51 ) ( 57 ) ABSTRACT

( 52 )

Int . CI . GO6F 9 / 44 ( 2018 . 01 ) G06F 11 / 36 ( 2006 . 01 ) H04L 29 / 08 ( 2006 . 01 ) G06F 16 / 951 ( 2019 . 01 ) U . S . CI . CPC . . . . . GO6F 11 / 3676 ( 2013 . 01 ) ; G06F 11 / 3672

( 2013 . 01 ) ; G06F 16 / 951 ( 2019 . 01 ) ; H04L 67 / 02 ( 2013 . 01 )

Field of Classification Search None See application file for complete search history .

( 58 )

An example system includes a processor to crawl a plurality of web pages of a web application to be tested . The proces sor is also configured to receive an intercepted input to the web application and an output from a web application associated with each crawled web page . The processor is to further configured to detect testable elements in the inter cepted input and the output . The processor is also configured to generate a fingerprint for each web page based on the detected testable elements . The processor is also configured to generate a list of clusters comprising one or more similar web pages based on the fingerprints . The processor is configured to test a single web page from each cluster .

( 56 ) References Cited U . S . PATENT DOCUMENTS

7 , 627 , 613 B1 * 12 / 2009 Dulitz . . . . . . . . . . . . . . . G06F 17 / 30864 7 , 676 , 465 B2 3 / 2010 Poola 14 Claims , 6 Drawing Sheets

Crawl Plurality of Web Pages of a Web Application to be Tested r 2012 200 . 00 . 000 . . . 000 . 000 . 000 . 000 . 00 . . . 0 . . . 000 . 000 00 . 000 . . . . 0 . . . . 0 . . . 000 . . . . 000 . 000 . 000 . 000 . 00 . . . . 0 . . . 000 . 00 . 00 . . . . 000 . . . 000 . . . 100 . 000 . 000 . 00 . . . 0 . . . 000 . 00 . . . 200 . 0

204 Receive Intercepted input to Web Server Application and Output from Web Application Associated with Each Crawled Web Page

?????????????????????????????????????????????????????? ????? ?????

someonemconsomsomosome concorrenconsommanomenoncommon commandemenrecomendamomomme

206 Detect Testable Elements in intercepted input and Output

( 208 Generate Fingerprint for Each Web Page Based on Detected Testable Elements

wwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwww

Generate List of Clusters Including One or More Similar Web Pages Based on Fingerprints

210

Test Single Web Page from Each Cluster 7 - 212

Page 2: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

US 10 , 346 , 291 B2 Page 2

( 56 ) References Cited

U . S . PATENT DOCUMENTS

2003 / 0126151 Al * 7 / 2003 Jung . . . . . . . . . . . . . . . . . G06F 17 / 30917 2004 / 0268303 A1 * 12 / 2004 Abe . . . . . . . . . . . . . . . . GO6F 8 / 74

717 / 108 2005 / 0060643 Al * 3 / 2005 Glass . . . . . . . . . . . . GO6F 17 / 241

715 / 205 2007 / 0050755 Al * 3 / 2007 Mizrachi . . . . . . . . . . . . . . GO6F 21 / 55

717 / 116 2007 / 0198635 A1 * 8 / 2007 Lindner . . . . G06F 17 / 30011

709 / 203 2007 / 0208703 A1 * 9 / 2007 Shi . . . . . . . . . . . . . . . . . . . . GO6F 17 / 30864 2007 / 0299869 A1 * 12 / 2007 Clary . . . . . . . . . . . . . . . . . . G06F 11 / 3452 2008 / 0120305 A1 * 5 / 2008 Sima G06F 17 / 30864 2008 / 0235163 A1 * 9 / 2008 Balasubramanian

G06F 17 / 30864 706 / 12

2008 / 0263026 A1 * 10 / 2008 Sasturkar . . . . . . . . . . . . G06F 17 / 2211 2008 / 0288509 A1 * 11 / 2008 Mysen . . . . . . . . . . . . . . G06F 17 / 30864 2008 / 0289047 A1 * 11 / 2008 Benea . . . . . . . . . . . G06F 21 / 64

726 / 27 2009 / 0063538 A1 * 3 / 2009 Chitrapura . . . . . . . G06F 17 / 30887 2009 / 0150381 A1 * 6 / 2009 Dasdan . . . . . . . . . . . . . GOOF 17 / 30864 2009 / 0157597 A1 * 6 / 2009 Tiyyagura . . . . . . . . . GO6F 17 / 30705 2009 / 0164411 A1 * 6 / 2009 Dasdan . . . . . . . . . . . . . GO6F 17 / 30882 2009 / 0164502 A1 * 6 / 2009 Dasgupta . . . . . . . . . . GO6F 17 / 30887 2010 / 0064281 A1 * 3 / 2010 Kimball . . . . . . . . . . . . . . HO4L 41 / 0853

717 / 124 2010 / 0169311 Al * 7 / 2010 Tengli . . . . . . . . . . . . . GO6F 17 / 30864

707 / 736

2010 / 0242028 A1 * 9 / 2010 Weigert . . . . . . . . . . . . . . . . GO6F 21 / 105 717 / 131

2011 / 0307436 A1 * 12 / 2011 Cai . . . . . . . . . . . . . . . . . . . G06F 17 / 30625 706 / 48

2012 / 0016897 A1 * 1 / 2012 Tulumbas . . . . . . . . . GO6F 17 / 30887 707 / 759

2012 / 0284270 A1 * 11 / 2012 Lee . . . . . . . . . . . . . . . . . GO6F 17 / 30011 707 / 737

2014 / 0207743 A1 * 7 / 2014 Quinn . . . . . . . . . . . . . . GO6F 3 / 0608 707 / 692

2015 / 0067839 A1 * 3 / 2015 Wardman . . . . . . . . . . . . . . GO1F 11 / 263 726 / 22

2016 / 0048849 A1 * 2 / 2016 Shiftan . . . . . . . . . . . . . GO6F 17 / 30247 705 / 7 . 29

2016 / 0092591 Al * 3 / 2016 Barouni Ebrahimi . . . . . . . . . . G06F 17 / 30011

707 / 709 2016 / 0117347 A1 * 4 / 2016 Nielsen . . . . . . . . . . . . . G06F 17 / 30256

707 / 738 2016 / 0283229 A1 * 9 / 2016 Rogers . . . . . . . . . . . . G06F 8 / 751 2016 / 0335333 A1 * 11 / 2016 Desineni . . . . . . . . . . . G06F 17 / 30156 2017 / 0257383 A1 * 9 / 2017 Ficarra . . . . . . . . . . . . . . . . HO4L 63 / 1408 2018 / 0011919 A1 * 1 / 2018 Warren . . . . . . . . . . . . . GO6F 17 / 30011

OTHER PUBLICATIONS

Gurmeet Singh Manku , et al . ; “ Detecting Near Duplicates for Web Crawling ” ; WWW 2007 / Track : Data Mining , 9 pages . http : / / www2007 . cpsc . ucalgary . ca / papers / paper215 . pdf .

* cited by examiner

Page 3: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

atent Jul . 9 , 2019 Sheet 1 of 6 US 10 , 346 , 291 B2

wwwwwwwwwwwwwwww wwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwwww

Computing Device 106

un

102 , Processor 108 ) 19 . 40

OOOOOOOOOOOoooooosoooooooooo2000 1 / 0 Device Interface

Wwwwwwww pondono conock Devices VIDEO DOO 112 HOULUOOCURI CU OOUUCHIRURA MODNI DOVORILDUTERTE

Why

104 Memory Device

122 Storage Device

poo . 2017 ! Display

Interface Display Device

Crawler Module 116 , wwwww

NIC PODV200 200 Receiver Module x 124 Detector Module

WEB - 128 Fingerprint Generator Module

STTILETILTS TITLETIO

wwwwwwww !

Cluster Generator Module ODVOooooooooooooooooooooooooooooooooooooooooooooooo TOOOOOOOOOOOOOOOOOOO 102 . 104 , 240 , 2 2010 mang

Tester Module

Predictor Module - 136 w w wwwwwwwwwwwwwww

ECO000000000 ODS ODOXA

Network

120 www Client Computing Device

tooooooooo 00 . 000 . 000 . 000 . 000

FIG . 1

Page 4: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

U . S . Patent Jul . 9 , 2019 Sheet 2 of 6 US 10 , 346 , 291 B2

0000000000000 0 00000 * * * * * * * * * * * * * * * * 0000000000 * * * * * * 000000000000000000000000000000000000000000000000

enna Crawi Plurality of Web Pages of a Web Application to be Tested 202

Receive Intercepted Input to Web Server Application and Output from Web Application Associated with Each Crawled Web Page

204 pop odon * * 100000000000

Lowth PRODUK wwwwwww wwww

Detect Testable Elements in Intercepted Input and Output - 206 WA

Generate Fingerprint for Each Web Page Based on Detected Testable Elements

208

Generate List of Clusters Including One or More Similar Web Pages Based on Fingerprints 2210

* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

244TAR Test Single Web Page from Each Cluster 000 20000 20000 2000022000 2200 2002000200120000 20000 + + . - . - . - . - . - . - . - ' ' ' ' . * . 002079000000000000000022000022000 22002200

200 FIG . 2 .

Page 5: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

atent Jul . 9 , 2019 Sheet 3 of 6 US 10 , 346 , 291 B2

302 Receive List of Clusters 0 . Abogo

TY MYYTY TYYTYTY YA TYY

marina 304 Generate Maximal Distance for Each

Cluster in List of Clusters doodgegoogopge

306 WAMKAMW Receive Request to be Sent to Web Application 100Roop DOOOOOO

HORNSHARONOR

- 308 Request to

be Sent Would Result in Web Page that Belongs to Cluster

Based on Maximal Distances ?

NONONONORONREROWONGKORONKO ROKONTRENADOR

MEXXXXXXX WWWWWWWWWW wwwwwwwwwwwwwwwwwwwwww w wwwwwwwwwwwwwwwww

Pozsooscopo possono Send Request Do Not Send Request osposs

310 312 -

300 FIG . 3

Page 6: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

atent Jul . 9 , 2019 Sheet 4 of 6 US 10 , 346 , 291 B2

RODA O HADARARAARAA K . ODDDDDDDDOL , oooooooooYN100 * au * 4040

ho ou KOMODO . . 0 .

4444444 RO ! MARALI * * *

404N onwwww y cup

134 mW . . ada WWW . Www www yako Woningen na

TULPAT

DUEELOBLAST wang

born

Dopood DO099999 WMA w bar IMEIRAKKA ON wwww ox WXN 3 . 0

Tops W 400 oo m of ott ww * * its a doo chy 147

STR . De van

O SE 2402 arryhouse 402 12 CALACK YYYY OLD 14 .

& M

O Domas SA TOR OUDE

* * * * * * * * *

wwwxdur .

Ogg parts 404B WWWWW

wwwwwwwwwwww www :

Wow ki

A nnonowwwwwwwwww

VEVO .

06 . 2013 nar od HOME

Uw tartott OOOOLLLER WHOOOOOOOOOOOOO O OOODAK

FIG . 4

Page 7: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

Anna

Ooo

YUXWXXXwwwwwww Dodao dodataka

U . S . Patent

THANK

Software Development Virtual

Mapping and

Classroom Navigation Lifecycle Education Management

Delivery pue

Data Analyücs Processing

atent

Web Application Testing Transaction Processing

www

namy

RODIGORONAT

+ . . .

. . .

. .

www .

Workloads

DONDE

506

Metering

Resource Provisioning KANTECHNECHADO

and

Service Level Management User Portal

SLA Planning and Fulfilment

Jul . 9 , 2019

Pricing

DERMATOMAROVA

WWWWWWWWWY

DOLDURDUODVAdwood HAVALAULA

Management

povodomoroscoop voor de

VIIKOKOROKORKOKOONOLULUULUULUULOKK Oooooooooooooo out

Occ Ennen WYWY Oo824084004 - 2009

PODRODDER KKK

504

W

awei

BANK

* * *

* * * WWW

Locomo tended ROKY

kwani Lood WOOOOOO

www

www

XXX WYS

M

wwwwwww anggo

Virtual Servers

Virtual Servers Virtual Networks

Virtual Clients

Sheet 5 of 6

Applications

Virtualization 4

:

14LUULUULUV4000uu000000 LLLLLLLL LLLLLLLLLLL

1940 . 24 . 04 . 2004 . 002 . 00

especies

502

choico mm

XXXWWW

wwwwwwwwwwwwwwwy

POGODIO wwuuuuuwi 1 XULLOR onsoooonoosos .

ve

Panorama ZOR

heit WWW . 5 12

CARA

A

2

IBM

IBM

Storage

Mainframes RISC Architecture Series BladeCenter

Networking Network Database Application Software /

Server Software

Servers Systems

Systems

Hardware and Software

500

US 10 , 346 , 291 B2

FIG . 5

Page 8: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

atent Jul . 9 , 2019 Sheet 6 of 6 US 10 , 346 , 291 B2

600

602 604 604 , ••••••••••••••••••••••••••• 111111111111111111111111

Processor - 606 Crawler Module PR O DUSULUDURURKAULUKKAKAKKU

- 608 Receiver Module wwwwwwwwwwwwwwwwwwwwwwwwwwwwww

610 Detector Module xxxxxx Fingerprint Generator Module 612 2xk anan

- - - 614 Cluster Generator Module ANA poopo n oponoponopono POOOOOOO

Tester Module - 616

xxxxx Predictor Module

FIG . 6

Page 9: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

US 10 , 346 , 291 B2

TESTING WEB APPLICATIONS USING FIG . 2 is an information flow diagram of an example CLUSTERS method for testing applications pages using clusters ;

FIG . 3 is a process flow diagram of an example method BACKGROUND for testing web applications using cluster predictions ;

5 FIG . 4 is a block diagram of an example cloud computing The present techniques relate to testing web applications . environment according to embodiments described herein ;

More specifically , the techniques relate to testing web appli - FIG . 5 is an example abstraction model layers according cations using clusters . to embodiments described herein ; and

FIG . 6 is an example tangible , non - transitory computer SUMMARY 10 readable medium that can test web applications using clus

ters . In a particular implementation described herein , a system DETAILED DESCRIPTION can include a processor , configured by code executing

therein , to crawl ( traverse and / or parse the contents of ) a 15 Dynamic web application testers can test web applications plurality of web pages of a web application to be tested . The by crawling web applications in their entirety and testing all suitably configured processor can also further receive an the elements of each web application . For example , dynamic intercepted input to the web application and an output from testing of web applications may include two phases : crawl the web application associated with each crawled web page . ing and testing . One objective of the crawling phase is to The suitably configured processor can also detect testable 20 identify all testable elements in a web application . For elements in the intercepted input and the output . The pro example , the testable element may include parameters , cessor can also generate a fingerprint for each web page cookie values , etc . In the testing phase , one role of the based on the detected testable elements . The suitably con dynamic web application tester is to attack testable elements figured processor can further generate a list of clusters and validate tests on the testable elements . However , such comprising one or more similar web pages based on the 25 dynamic web scanning may not be able to scan large web fingerprints . The suitably configured processor can also applications due to the vast number of web pages to crawl , further test a single web page from each cluster . and the amount of testable elements per page combined with

A further implementation described herein , is directed to the physical constraints of machines . a method that includes crawling , via a processor configured In one implementation , a suitably configured processor by code executing therein , a plurality of web pages of a web 30 tests web applications using clusters . For example , the application to be tested . The method also includes receiving , processor is configured to crawl a plurality of web pages of via the processor , an intercepted input to the web application a web application to be tested . The processor receives an and an output from the web application associated with each intercepted input to a web application and an output from the crawled web page . The method can also include detecting , web application associated with each crawled web page . In via the processor , testable elements in the intercepted input 35 a non - limiting implementation , the input includes a Hyper and the output . The method can further include generating , text Transfer Protocol ( HTTP ) request and the output via the processor , a fingerprint for each web page based on includes a HTTP response . The suitably configured proces the detected testable elements . The method can also include sor is further configured to detect testable elements in the generating , via the processor , a list of clusters comprising intercepted input and the output . In one arrangement , the one or more similar web pages based on the fingerprints . The 40 processor generates a fingerprint for each web page based on method can also further include testing , via the processor , a the detected testable elements . The processor then generates single web page from each cluster . a list of clusters comprising one or more similar web pages

According to another embodiment described herein , a based on the fingerprints . Here , the processor tests a single computer program product for testing web applications can web page from each cluster . For example , since each cluster include computer - readable storage medium having program 45 may have been created by the same server - side application code embodied therewith . The computer readable storage functionality , only one page from each cluster is tested . medium is not a transitory signal per se . The program code Thus , the present techniques are able to reduce the number is executable by a processor to cause the processor to crawl of web pages to be tested and increase the efficiency of the a plurality of web pages of a web application to be tested dynamic web application tester . In some experiments , the The program code can also cause the processor to receive an 50 number of web pages needed to be tested could be reduced intercepted input to the web application and an output from 10 - 20 times and still achieve an equivalent level of cover the web application associated with each crawled web page . age . In a non - limiting implementation , the present tech The program code can also cause the processor to detect niques further increase efficiency of the testing procedure by testable elements in the intercepted input and the output . The predicting which requests would result in additional web program code can also cause the processor to generate a 55 pages in a cluster that would be redundant for purposes of fingerprint for each web page based on the detected testable testing . For example , the processor generates a maximal elements . The program code can also cause the processor to distance between requests for each cluster in the list of also further generate a list of clusters comprising one or clusters and detect that the request would not result in a web more similar web pages based on the fingerprints . The page that belongs to any cluster based on the maximal program code can also cause the processor to test a single 60 distances for the clusters . Thus , the techniques may enable web page from each cluster . additional efficiency for a web application tester once a list

of clusters has been generated . Furthermore , in some BRIEF DESCRIPTION OF THE SEVERAL examples , the processor is further configured to detect a

VIEWS OF THE DRAWINGS security vulnerability based on the testing and modify the 65 web application to prevent the security vulnerability . For

FIG . 1 is a block diagram of an example system that can example , the processor removes characters from user input test web applications using clusters ; that result in the execution of unauthorized scripts .

Page 10: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

US 10 , 346 , 291 B2

ources are

In some scenarios , the techniques described herein are cloud infrastructure including networks , servers , operating implemented in a cloud computing environment . As dis - systems , or storage , but has control over the deployed cussed in more detail below in reference to at least FIGS . 4 , applications and possibly application hosting environment 5 , and 6 , a computing device configured to test web appli - configurations . cations using clusters are implemented in a cloud computing 5 Infrastructure as a Service ( laaS ) : the capability provided environment . It is understood in advance that although this to the consumer is to provision processing , storage , net disclosure may include a description on cloud computing , works , and other fundamental computing resources where implementation of the teachings recited herein are not the consumer is able to deploy and run arbitrary software , limited to a cloud computing environment . Rather , embodi - which can include operating systems and applications . The ments of the present invention are capable of being imple - 10 consumer does not manage or control the underlying cloud mented in conjunction with any other type of computing infrastructure but has control over operating systems ; stor environment now known or later developed . age , deployed applications , and possibly limited control of

Cloud computing is a model of service delivery for select networking components ( e . g . , host firewalls ) . enabling convenient , on - demand network access to a shared Deployment Models are as follows : pool of configurable computing resources ( e . g . networks , 15 Private cloud : the cloud infrastructure is operated solely network bandwidth , servers , processing , memory , storage , for an organization . It is managed by the organization or a applications , virtual machines , and services ) that is rapidly third party and may be located on - premises or off - premises . provisioned and released with minimal management effort Community cloud : the cloud infrastructure is shared by or interaction with a provider of the service . This cloud several organizations and supports a specific community that model may include at least five characteristics , at least three 20 has shared concerns ( e . g . , mission , security requirements , service models , and at least four deployment models . policy , and compliance considerations ) . It is managed by the

Characteristics are as follows : organizations or a third party and exist on - premises or On - demand self - service : a cloud consumer can unilater - off - premises .

ally provision computing capabilities , such as server time Public cloud : the cloud infrastructure is made available to and network storage , as needed automatically without 25 the general public or a large industry group and is owned by requiring human interaction with the service ' s provider . an organization selling cloud services .

Broad network access : capabilities are available over a Hybrid cloud : the cloud infrastructure is a composition of network and accessed through standard mechanisms that two or more clouds ( private , community , or public ) that promote use by heterogeneous thin or thick client platforms remain unique entities but are bound together by standard ( e . g . , mobile phones , laptops , and PDAs ) . 30 ized or proprietary technology that enables data and appli

cation portability ( e . g . , cloud bursting for load balancing pooled to serve multiple consumers using a multi - tenant between clouds ) . model , with different physical and virtual resources dynami - A cloud computing environment is service oriented with cally assigned and reassigned according to demand . There is a focus on statelessness , low coupling , modularity , and a sense of location independence in that the consumer 35 semantic interoperability . At the heart of cloud computing is generally has no control or knowledge over the exact an infrastructure comprising a network of interconnected location of the provided resources but is able to specify nodes . location at a higher level of abstraction ( e . g . , country , state , With reference now to FIG . 1 , an example computing or datacenter ) . device can test web applications using clusters . The com

Rapid elasticity : capabilities is rapidly and elastically 40 puting device 100 is for example , a server , a network device , provisioned , in some cases automatically , to quickly scale desktop computer , laptop computer , tablet computer , or out and rapidly released to quickly scale in . To the consumer , smartphone . In some examples , computing device 100 is a the capabilities available for provisioning often appear to be cloud computing node . Computing device 100 is described unlimited and can be purchased in any quantity at any time . in the general context of computer system executable Measured service : cloud systems automatically control 45 instructions , such as program modules , being executed by a

and optimize resource use by leveraging a metering capa - computer system . Generally , program modules may include bility at some level of abstraction appropriate to the type of routines , programs , objects , components , logic , data struc service ( e . g . , storage , processing , bandwidth , and active user tures , and so on that perform particular tasks or implement accounts ) . Resource usage can be monitored , controlled , and particular abstract data types . Computing device 100 is reported providing transparency for both the provider and 50 practiced in distributed cloud computing environments consumer of the utilized service . where tasks are performed by remote processing devices that

Service Models are as follows : are linked through a communications network . In a distrib Software as a Service ( SaaS ) : the capability provided to uted cloud computing environment , program modules are

the consumer is to use the provider ' s applications running on located in both local and remote computer system storage a cloud infrastructure . The applications are accessible from 55 media including memory storage devices . various client devices through a thin client interface such as The computing device 100 may include a processor 102 a web browser ( e . g . , web - based email ) . The consumer does configured to execute stored instructions , a memory device not manage or control the underlying cloud infrastructure 104 to provide temporary memory space for operations of including network , servers , operating systems , storage , or said instructions during operation . The processor can be a even individual application capabilities , with the possible 60 single - core processor , multi - core processor , collection of exception of limited user - specific application configuration single core or multi - core processors , computing cluster , or settings . any number of other configurations . The memory 104 can

Platform as a Service ( PaaS ) : the capability provided to include random access memory ( RAM ) , read only memory , the consumer is to deploy onto the cloud infrastructure flash memory , EPROMs , or any other suitable memory consumer - created or acquired applications created using 65 systems . programming languages and tools supported by the provider . The processor 102 are connected through a system inter The consumer does not manage or control the underlying connect 106 ( e . g . , PCI® , PCI - Express® , etc . ) to an input /

Page 11: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

US 10 , 346 , 291 B2

output ( I / O ) device interface 108 adapted to connect the request . In some examples , the fingerprint generator 130 computing device 100 to one or more I / O devices 110 . The configures the processor to calculate a similarity score I / O devices 110 may include , for example , a keyboard and between the fingerprint for each web pages and each cluster a pointing device , wherein the pointing device may include in the list of clusters based on a calculated hamming a touchpad or a touchscreen , among others . The I / O devices 5 distance . The cluster generator module 132 configures the 110 are built - in components of the computing device 100 . processor to generate a list of clusters comprising one or Alternatively , the I / O devices 110 are devices that are more similar web pages based on the fingerprints . For externally connected to the computing device 100 . example , the cluster generator 132 configures the processor

The processor 102 is , in one implementation , linked to calculate a similarity score between a fingerprint for a web through the system interconnect 106 to a display interface 10 page from the plurality of web pages and a cluster from the 112 adapted to connect the computing device 100 to a list of clusters . The cluster generator 132 then configures the display device 114 . The display device 114 may include a processor to add the web page to the cluster in response to display screen that is a built - in component of the computing detecting that the similarity score exceeds a similarity device 100 . The display device 114 may also include a threshold . In some examples , the cluster generator module computer monitor , television , or projector , among others , 15 132 configures the processor to calculate a similarity score that is externally connected to the computing device 100 . In between the fingerprint for each web page and each cluster addition , a network interface controller ( NIC ) 116 are in the list of clusters based on a calculated hamming distance adapted to connect the computing device 100 through the or any other suitable linear block code technique . In some system interconnect 106 to the network 118 . In some examples , the cluster generator module 132 configures the embodiments , the NIC 116 can transmit data using any 20 processor to calculate a hamming distance for a fingerprint suitable interface or protocol , such as the internet small of a web page and a cluster by detecting the number of computer system interface , among others . The network 118 positions in two compared fingerprints at which correspond are a cellular network , a radio network , a wide area network i ng symbols are different . For example , the hamming dis ( WAN ) , a local area network ( LAN ) , or the Internet , among tance may indicate the number of substitutions to be made others . An external computing device 120 may connect to 25 to change one fingerprint into the other . The tester module the computing device 100 through the network 118 . In some 134 configures the processor to test a single web page from examples , external computing device 120 is an external each cluster . Thus , the tester module 134 configures the webserver 120 . In some examples , external computing processor to efficiently test web applications by testing a device 120 is a cloud computing node . single web page from each cluster and not every web page ,

The processor 102 is , in one implementation , linked 30 while maintaining testing coverage . through the system interconnect 106 to a storage device 122 In some examples , the predictor module 136 configures that can include a hard drive , an optical drive , a USB flash the processor to generate a maximal distance between drive , an array of drives , or any combinations thereof . In requests for each cluster in the list of clusters . In some some examples , the storage device may include a crawler examples , the predictor module 136 configures the processor module 124 , a receiver module 126 , a detector module 128 , 35 to receive a request to be sent to the web application . The a fingerprint generator module 130 , a cluster generator predictor module 136 configures the processor to then detect module 132 , a tester module 134 , and a predictor module that the request would not result in a web page that belongs 136 . In some examples , one or more of the modules 124 - 136 to any cluster based on the maximal distances for the are implemented as an application or a web browser plugin . clusters . In some examples , the crawler module 124 config The crawler module 124 configures the processor to crawls 40 ures the processor to send the request to a web application a plurality of web pages of a web application to be tested in response to detecting that the request would not result in For example , given one or more seed uniform resource the web page that belongs to any cluster . In some examples , locators ( URLs ) , the crawler module 124 downloads the the crawler module 124 configures the processor to not send web pages associated with the URLs , extract any hyperlinks the request to a web application in response to detecting that contained in the URLs , and adds the hyperlinks to a list of 45 the request would result in the web page that belongs to a URLs to visit , ( also known as a ' crawl frontier ' ) . cluster . Thus , the predictor module 136 enables the crawler URLs from the crawl frontier are then recursively visited module 124 to configure the processor to perform more

according to the crawler ' s policy . The receiver module 126 efficiently by sending a reduced number of requests and can then receive an intercepted input to a web application enable the receiver module 126 to cause the processor to and an output from the web application associated with each 50 perform more efficiently by not receiving web pages that crawled web page . For example , the input may include an would not be used in testing . HTTP request and the output may include an HTTP It is to be understood that the block diagram of FIG . 1 is response . In some examples , the input may include a GET not intended to indicate that the computing device 100 is to parameter and the output may include a document object include all of the components shown in FIG . 1 . Rather , the model . For example , the document object model is a tree 55 computing device 100 can include fewer or additional structure of a web page . The detector module 128 configures components not illustrated in FIG . 1 ( e . g . , additional the processor to detect testable elements in the intercepted memory components , embedded controllers , modules , addi input and the output . The fingerprint generator module 130 tional network interfaces , etc . ) . Furthermore , any of the also configures the processor to generate a fingerprint for functionalities of the crawler module 124 , the receiver each web page based on the detected testable elements . For 60 module 126 , the detector module 128 , the fingerprint gen example , the fingerprint for each web page may include a erator module 130 , the cluster generator module 132 , the plurality of response element fingerprints and a plurality of tester module 134 , and the predictor module 136 , are par request element fingerprints . For example , the response tially , or entirely , implemented in hardware and / or in the element fingerprints may include concatenated fingerprints processor 102 . For example , the functionality described of a number of extracted response elements in a response . 65 herein is implemented with an application specific integrated The request element fingerprints may include concatenated circuit , logic implemented in an embedded controller , or in fingerprints of a number of extracted request elements in a logic implemented in the processor 102 , among others . In

Page 12: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

US 10 , 346 , 291 B2

some embodiments , the functionalities of the crawler mod same server - side functionality or script . In some examples , ule 124 , the receiver module 126 , the detector module 128 , the processor may take two of the fingerprints and calculate the fingerprint generator module 130 , the cluster generator a similarity based on an average of the ratios of each of the module 132 , the tester module 134 , and the predictor module testable elements . In some examples , the processor can 136 , are implemented with logic , wherein the logic , as 5 calculate a similarity score between a fingerprint for a web referred to herein , can include any suitable hardware ( e . g . , a page from the plurality of web pages and a cluster from the processor , among others ) , software ( e . g . , an application , list of clusters and add the web page to the cluster in among others ) , firmware , or any suitable combination of response to detecting that the similarity score exceeds a hardware , software , and firmware . similarity threshold . In some examples , the processor can

FIG . 2 is a process flow diagram of an exemplary method 10 calculate a similarity score between the fingerprint for each for testing applications pages using clusters . The method web pages and each cluster in the list of clusters based on a 200 is implemented with any suitable computing device , calculated hamming distance . For example , the hamming such as the computing device 100 of FIG . 1 . For example , distance may indicate number of positions in two compared the method is implemented via the processor 102 of com - fingerprints at which corresponding symbols are different . In puting device 100 . some examples , the processor can calculate a similarity

At block 202 , a suitably configured processor crawls a score by calculating min - wise independent permutations plurality of web pages of a web application to be tested . For ( MinHash ) . For example , MinHash are used to calculate the example , the web pages of the web application are recur ratio of the number of elements of an intersection between sively visited according to a crawling policy . In some two fingerprints and the number of elements of their union examples , the crawling policy is an instruction set , rule or 20 without explicitly computing the intersection and the union . selection policy that identifies the pages to be downloaded . In some examples , the processor may calculate a similarity For example , a selection policy is a focused crawling policy score using a SimHash hashing function . For example , that looks for similarity of web pages to a given query . In similar elements are hashed to similar hash values having some examples , a selection policy is a URL normalization low hamming distances . policy that tries to normalize a URL to avoid redundant 25 At block 212 , the processor tests a single web page from crawling . In some examples , the crawling policy is a par - each cluster . For example , since each cluster is believed to allelization policy that uses many crawlers in parallel to have been created by the same server side functionality , only increase crawling efficiency . For example , the parallelization one page from each cluster is tested . In some examples , the policy can state how to coordinate a plurality of distributed processor can then detect a security vulnerability based on web crawlers . 30 the testing and modify the web application to prevent the

At block 204 , the processor receives an intercepted input security vulnerability . For example , the processor may to the web application and an output from a web application remove characters from user input that can result in the associated with each crawled web page . For example , the execution of unauthorized scripts . In some examples , if intercepted input is a HTTP request . In some examples , the security vulnerabilities are detected from the single page output is an HTTP response . In some examples , the proces - 35 tested , the processor is configured to modify the web appli sor can intercept input to a web application , send the input cation to affect each of the pages of the cluster and prevent to the web application if the input would not result in a web the security vulnerability . page that belongs to any cluster based on the maximal The process flow diagram of FIG . 2 is not intended to distances calculated for list of clusters , and receive an output indicate that the operations of the method 200 are to be from the web application in response to the input . 40 executed in any particular order , or that all of the operations At block 206 , the processor detects testable elements in of the method 200 are to be included in every case . Addi

the intercepted input and the output . For example , testable tionally , the method 200 can include any suitable number of elements in the input can include schemes , ports , param - additional operations . eters , cookies , headers , etc . In some examples , the testable FIG . 3 is a process flow diagram of an example method elements in the output can include document object model 45 for testing web applications using cluster predictions . The ( DOM ) elements of HTTP responses . method 300 is implemented with any suitable computing At block 208 , the processor generates a fingerprint for device , such as the computing device 100 of FIG . 1 . For

each web page based on the detected testable elements . For example , the method is implemented via the processor 102 example , the processor can generate a fingerprint for each of computing device 100 . element in a request resulting in the web page and a 50 At block 302 , the processor receives a list of clusters . For fingerprint for each element in the web page and combine example , the list of clusters may have been generated using the fingerprints for the elements to generate the fingerprint the method 200 above . for each web page . In some examples , the fingerprint for a At block 304 , the processor generates a maximal distance web page is generated using a dictionary or vector that for each cluster in the list of clusters . For example , the counts the number of occurrences of testable elements in the 55 processor can generate a maximal distance between requests requests and responses of a web page . For example , the for a cluster by calculating similarity Scores between processor may count the number of query or body param requests resulting in the web pages in the cluster . The eters in the request . In some examples , the processor can maximal distance is a similarity score that is lower than count the number of specific HTML elements in the other similarity scores in the cluster . In some examples , the response . Thus , the fingerprint is based on a concatenation 60 similarity scores are calculated using any suitable form of of both the fingerprints generated for input elements and hamming distance . In some examples , the maximal distance fingerprints generated for output elements . between requests in a cluster is stored as a variable of the At block 210 , the processor generates a list of clusters cluster . Thus , for every web - page that is inserted into the

comprising one or more similar web pages based on the cluster , the processor can consider the request that preceded fingerprints . For example , the clusters may represent web 65 the web page , and iteratively compute the similarity between pages with similar server - side functionality . In some the request and other requests in the cluster . For example , a examples , the web pages in a cluster are the output of the similarity score is calculated using any suitable variation of

Page 13: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

US 10 , 346 , 291 B2 10

a hamming distance . In some examples , the hamming dis - to be illustrative only and that computing nodes 402 and tance is performed on a portion of the request , rather than cloud computing environment 400 can communicate with every part of the request . any type of computerized device over any type of network At block 306 , the processor receives a request to be sent and / or network addressable connection ( e . g . , using a web

to a web application . For example , the request sent are a 5 browser ) . textual representation of a request that may have been Referring now to FIG . 5 , a set of functional abstraction intercepted before reaching the web application . In some layers provided by cloud computing environment 400 ( FIG . examples , the request is sent to the target web application in 4 ) is shown . It should be understood in advance that the a test environment . For example , the request to be sent is components , layers , and functions shown in FIG . 5 are generated from a received test input . In some examples , the 10 intended to be illustrative only and embodiments of the request is generated automatically . In one implementation , a invention are not limited thereto . As depicted , the following suitably configured processor may generate the request by layers and corresponding functions are provided . parsing previous responses and extracting new links . In Hardware and software layer 500 includes hardware and some examples , the request is given directly . For example , software components . Examples of hardware components the processor is configured to receive starting URLs , also 15 include mainframes , such as but not limited to IBM® known as seeds , and generate the request based on the zSeries? systems ; RISC ( Reduced Instruction Set Com received starting URLs . In some examples , the request is puter ) architecture based servers , such as but not limited to generated by intercepting data from a client to the web IBM pSeries? systems ; IBM xSeries? systems ; IBM Bla application . For example , processor may intercept the data at deCenter systems ; storage devices ; networks and network a browser . 20 ing components . Examples of software components include

With respect to decision diamond 308 , the processor is network application server software , such as but not limited configured to determine whether the request to be sent to IBM WebSphere® application server software ; and data would result in a web page that belongs to a cluster based on base software , such as but not limited to IBM DB2 the maximal distances . For example , if the request is within database software . ( IBM , zSeries , pSeries , xSeries , Blade the maximal distance of a cluster then the processor detects 25 Center , WebSphere , and DB2 are trademarks of Interna whether the request would result in a web page that would tional Business Machines Corporation registered in many belong to the cluster . If the request would not result in a web jurisdictions worldwide ) . page that belongs to a cluster , then the method advances to Virtualization layer 502 provides an abstraction layer the step provided in block 310 . If the request would result in from which the following examples of virtual entities are a web page that belongs to a cluster , then the method 30 provided : virtual servers ; virtual storage ; virtual networks , advances to the step provided in block 312 . including virtual private networks ; virtual applications and

At block 310 , a suitably processor sends the request . For operating systems ; and virtual clients . In one example , example , the processor may send the request to a target web management layer 504 may provide the functions described application . Here , the requests are sent to the web applica below . Resource provisioning provides dynamic procure tion in a testing environment . The processor may then 35 ment of computing resources and other resources that are receive a response that is processed according to the step utilized to perform tasks within the cloud computing envi outlined at element 200 of FIG . 2 above . ronment . By way of overview , metering and pricing provide

At block 312 , the processor refrains from sending the cost tracking as resources are utilized within the cloud request . For example , since the request may not return any computing environment , and billing or invoicing for con useful web page for testing , both the resources used to send 40 sumption of these resources . In one example , these resources the request and the resources used to receive the web page may comprise application software licenses . Security pro are used for other processing . Thus , the processor is able to vides identity verification for cloud consumers and tasks , as efficiently test the web application by reducing the number well as protection for data and other resources . User portal of web pages to be clustered . provides access to the cloud computing environment for

The process flow diagram of FIG . 3 is not intended to 45 consumers and system administrators . Service level man indicate that the operations of the method 300 are to be agement provides cloud computing resource allocation and executed in any particular order , or that all of the operations management such that required service levels are met . of the method 300 are to be included in every case . Addi - Service Level Agreement ( SLA ) planning and fulfillment tionally , the method 300 can include any suitable number of provides pre - arrangement for , and procurement of , cloud additional operations . 50 computing resources for which a future requirement is

Referring now to FIG . 4 , an illustrative cloud computing anticipated in accordance with an SLA . environment 400 is depicted . As shown , cloud computing Workloads layer 506 provides examples of functionality environment 400 comprises one or more cloud computing for which the cloud computing environment are utilized . nodes 402 with which local computing devices used by Examples of workloads and functions which are provided cloud consumers , such as , for example , personal digital 55 from this layer include : mapping and navigation ; software assistant ( PDA ) or cellular telephone 404A , desktop com - development and lifecycle management ; virtual classroom puter 404B , laptop computer 404C , and / or automobile com - education delivery ; data analytics processing ; transaction puter system 404N may communicate . Nodes 402 may processing ; and web application testing . communicate with one another . Here , the cloud computing The present techniques include a system , a method or elements are grouped ( not shown ) physically or virtually , in 60 computer program product . The computer program product one or more networks , such as Private , Community , Public , may include a computer readable storage medium ( or media ) or Hybrid clouds as described hereinabove , or a combination having computer readable program instructions thereon for thereof . This allows cloud computing environment 400 to causing a processor to carry out aspects of the present offer infrastructure , platforms and / or software as services for invention . which a cloud consumer does not need to maintain resources 65 The computer readable storage medium is a tangible on a local computing device . It is understood that the types device that can retain and store instructions for use by an of computing devices 404A - N shown in FIG . 4 are intended instruction execution device . The computer readable storage

Page 14: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

US 10 , 346 , 291 B2 11 12

medium are , for example , but is not limited to , an electronic grams of methods , apparatus ( systems ) , and computer pro storage device , a magnetic storage device , an optical storage gram products according to embodiments of the techniques . device , an electromagnetic storage device , a semiconductor It will be understood that each block of the flowchart storage device , or any suitable combination of the foregoing . illustrations and / or block diagrams , and combinations of A non - exhaustive list of more specific examples of the 5 blocks in the flowchart illustrations and / or block diagrams , computer readable storage medium includes the following : is , in one implementation , implemented by computer read a portable computer diskette , a hard disk , a random access able program instructions . memory ( RAM ) , a read - only memory ( ROM ) , an erasable In an non - limiting example , the computer readable pro programmable read - only memory ( EPROM or Flash gram instructions are provided to a processor of a general memory ) , a static random access memory ( SRAM ) , a por - 10 purpose computer , special purpose computer , or other pro table compact disc read - only memory ( CD - ROM ) , a digital grammable data processing apparatus to produce a machine , versatile disk ( DVD ) , a memory stick , a floppy disk , a such that the instructions , which execute via the processor of mechanically encoded device such as punch - cards or raised the computer or other programmable data processing appa structures in a groove having instructions recorded thereon , ratus , create means for implementing the functions / acts and any suitable combination of the foregoing . A computer 15 specified in the flowchart and / or block diagram block or readable storage medium , as used herein , is not to be blocks . These computer readable program instructions is , in construed as being transitory signals per se , such as radio one implementation , stored in a computer readable storage waves or other freely propagating electromagnetic waves , medium that can direct a computer , a programmable data electromagnetic waves propagating through a waveguide or processing apparatus , and / or other devices to function in a other transmission media ( e . g . , light pulses passing through 20 particular manner , such that the computer readable storage a fiber - optic cable ) , or electrical signals transmitted through medium having instructions stored therein comprises an a wire . article of manufacture including instructions which imple

The computer readable program instructions described ment aspects of the function / act specified in the flowchart herein are , in one implementation , downloaded to respective and / or block diagram block or blocks . computing processing devices from a computer readable 25 The computer readable program instructions is , in one storage medium or to an external computer or external implementation , loaded onto a computer , other program storage device via a network , for example , the Internet , a m able data processing apparatus , or other device to cause a local area network , a wide area network and / or a wireless series of operational steps to be performed on the computer , network . The network may comprise copper transmission other programmable apparatus or other device to produce a cables , optical transmission fibers , wireless transmission , 30 computer implemented process , such that the instructions routers , firewalls , switches , gateway computers and / or edge which execute on the computer , other programmable appa servers . A network adapter card or network interface in each ratus , or other device implement the functions / acts specified computing / processing device receives computer readable in the flowchart and / or block diagram block or blocks . program instructions from the network and forwards the FIG . 6 provides a block diagram depicting an example of computer readable program instructions for storage in a 35 a tangible , non - transitory computer - readable medium 600 computer readable storage medium within the respective that can test web applications using clusters . The tangible , computing / processing device . non - transitory , computer - readable medium 600 is in one

Computer readable program instructions for carrying out implementation , accessed by a processor 602 over a com operations of the present techniques may be assembler puter interconnect 604 . Furthermore , the tangible , non instructions , instruction - set - architecture ( ISA ) instructions , 40 transitory , computer - readable medium 600 may include machine instructions , machine dependent instructions , code to direct the processor 602 to perform the operations of microcode , firmware instructions , state - setting data , or the methods 200 and 300 of FIGS . 2 and 3 above . either code or object code written in any combination of one In a non - limiting example , the various software compo or more programming languages , including an object ori - nents discussed herein are stored on the tangible , non ented programming language such as Smalltalk , C + + or the 45 transitory , computer - readable medium 600 , as indicated in like , and conventional procedural programming languages , FIG . 6 . For example , a crawler module 606 includes code to such as the “ C ” programming language or similar program - crawl a plurality of web pages of a web application to be ming languages . The computer readable program instruc - tested . A receiver module 608 includes code to configure the tions may execute entirely on the user ' s computer , partly on processor to receive an intercepted input to the web appli the user ' s computer , as a stand - alone software package , 50 cation and an output from a web application associated with partly on the user ' s computer and partly on a remote each crawled web page . For example , the input may include computer or entirely on the remote computer or server . In the an HTTP request or a GET parameter and the output may latter scenario , the remote computer is connected to the include an HTTP response or a document object model . A user ' s computer through any type of network , including a detector module 610 includes code to configure the proces local area network ( LAN ) or a wide area network ( WAN ) , or 55 sor to detect testable elements in the intercepted input and the connection are made to an external computer ( for the output . A fingerprint generator module 612 includes code example , through the Internet using an Internet Service to configure the processor to generate a fingerprint for each Provider ) . In some embodiments , electronic circuitry includ - web page based on the detected testable elements . For ing , for example , programmable logic circuitry , field - pro - example , the fingerprint generator module 612 can include grammable gate arrays ( FPGA ) , or programmable logic 60 code instructing the processor to generate a fingerprint for arrays ( PLA ) may execute the computer readable program each element in a request resulting in the web page and a instructions by utilizing state information of the computer fingerprint for each element in the web page . The fingerprint readable program instructions to personalize the electronic generator module 612 also includes code instructing the circuitry , in order to perform aspects of the present tech - processor to combine the fingerprints for the elements to niques . 65 generate the fingerprint for each web page . A cluster gen

Aspects of the present techniques are described herein erator module 614 includes code instructing the processor to with reference to flowchart illustrations and / or block dia generate a list of clusters including one or more similar web

Page 15: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

13 US 10 , 346 , 291 B2

14 pages based on the fingerprints . For example , the cluster nologies found in the marketplace , or to enable others of generator module 614 can include code instructing the ordinary skill in the art to understand the embodiments processor to calculate a similarity score between a finger - disclosed herein . print for a web page from the plurality of web pages and a The terminology used herein is for the purpose of describ cluster from the list of clusters and add the web page to the 5 ing particular embodiments only and is not intended to be cluster in response to detecting that the similarity score limiting of the invention . As used herein , the singular forms exceeds a similarity threshold . In some examples , the cluster “ a ” , “ an ” and “ the ” are intended to include the plural forms generator module 614 can include code instructing the as well , unless the context clearly indicates otherwise . It will processor to calculate a similarity score between the finger - be further understood that the terms " comprises ” and / or print for each web pages and each cluster in the list of 10 " comprising ” , when used in this specification , specify the clusters based on a calculated hamming distance . A tester presence of stated features , integers , steps , operations , ele module 616 includes code instructing the processor to test a ments , and / or components , but do not preclude the presence single web page from each cluster . A predictor module 618 or addition of one or more other features , integers , steps , includes code to generate a maximal distance between operations , elements , components , and / or groups thereof . requests for each cluster in the list of clusters . In some 15 It should be noted that use of ordinal terms such as " first , " examples , the predictor module 618 can include instructing “ second , ” “ third , ” etc . , in the claims to modify a claim the processor to code to generate a maximal distance element does not by itself connote any priority , precedence , between requests for a cluster by calculating similarity or order of one claim element over another or the temporal scores between requests resulting in the web pages in the order in which acts of a method are performed , but are used cluster . For example , the maximal distance is a similarity 20 merely as labels to distinguish one claim element having a score that is lower than the other similarity scores . The certain name from another element having a same name ( but predictor module 618 can also include code instructing the for use of the ordinal term ) to distinguish the claim elements . processor to receive a request to be sent to the web appli - Also , the phraseology and terminology used herein is for the cation . The predictor module 618 can include code instruct - purpose of description and should not be regarded as lim ing the processor to detect that the request would not result 25 iting . The use of “ including , " " comprising , ” or “ having , " in a web page that belongs to any cluster based on the " containing , ” “ involving , " and variations thereof herein , is maximal distances for the clusters . The crawler module 606 meant to encompass the items listed thereafter and equiva can configures the processor to send the request in response lents thereof as well as additional items . to detecting that the request would not result in the web page Particular embodiments of the subject matter described in that belongs to any cluster . Thus , the predictor module 618 30 this specification have been described . Other embodiments configures the processor to utilize the crawler module 606 to are within the scope of the following claims . For example , operate more efficiently by reducing unnecessary requests the actions recited in the claims can be performed in a from being sent and also prevent unnecessary web pages different order and still achieve desirable results . As one from being received at the receiver module 608 . It is to be example , the processes depicted in the accompanying fig understood that any number of additional software compo - 35 ures do not necessarily require the particular order shown , or nents not shown in FIG . 6 are included within the tangible , sequential order , to achieve desirable results . In certain non - transitory , computer - readable medium 600 , depending embodiments , multitasking and parallel processing can be on the particular application . advantageous .

The flowchart and block diagrams in the Figures illustrate All references cited herein , if any , are incorporated by the architecture , functionality , and operation of possible 40 reference to the same extent as if each individual publication implementations of systems , methods , and computer pro - and references were specifically and individually indicated gram products according to various embodiments of the to be incorporated by reference . present techniques . In this regard , each block in the flow - While the invention has been particularly shown and chart or block diagrams may represent a module , segment , described with reference to a preferred embodiment thereof , or portion of instructions , which comprises one or more 45 it will be understood by those skilled in the art that various executable instructions for implementing the specified logi - changes in form and details may be made therein without cal function ( s ) . In some alternative implementations , the departing from the spirit and scope of the invention . As such , functions noted in the block may occur out of the order noted the invention is not defined by the discussion that appears in the figures . For example , two blocks shown in succession above , but rather is defined by the points that follow , the may , in fact , be executed substantially concurrently , or the 50 respective features recited in those points , and by equiva blocks may sometimes be executed in the reverse order , lents of such features . depending upon the functionality involved . It will also be noted that each block of the block diagrams and / or flowchart What is claimed is : illustration , and combinations of blocks in the block dia 1 . A system , comprising : grams and / or flowchart illustration , is implemented by spe - 55 a memory ; and cial purpose hardware - based systems that perform the speci - a processor coupled to the memory configured with code fied functions or acts or carry out combinations of special to : purpose hardware and computer instructions . receive a list of web page clusters , each of the web page

The descriptions of the various embodiments of the clusters representing a grouping of a plurality of web present techniques have been presented for purposes of 60 pages of a web application to be tested , where each illustration , but are not intended to be exhaustive or limited web page cluster includes an associated input maxi to the embodiments disclosed . Many modifications and mal distance score generated by calculating the simi variations will be apparent to those of ordinary skill in the larity of each of one or more web page requests that art without departing from the scope and spirit of the resulted in a web page associated with a given web described embodiments . The terminology used herein was 65 page cluster of the web page clusters , the maximal chosen to best explain the principles of the embodiments , the distance score is lower than other similarity scores in practical application or technical improvement over tech a cluster of the list ;

Page 16: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

15

15

US 10 , 346 , 291 B2 16

receive an intercepted input to the web application , causing the intercepted HTTP request to be sent to the where the input is configured to cause the web web application as a request input where the similarity application to provide a responsive web page as an score between the HTTP request and any one of the one output for the input ; or more web page requests is less than the maximal

obtain a similarity score between the input and one or 5 distance score of the given web cluster ; more web page requests ; receiving the HTTP response for the HTTP request from

cause the input to be sent to the web application as a the web application ; request input where the similarity score between the detecting , via a processor , testable elements in the input and any one of the one or more web page received request and response ; requests is less than the maximal distance score of generating , via a processor , a combined fingerprint for

each web page based on a first fingerprint generated the given web cluster ; from the detected testable elements from the inter receive the output for the input from the web applica cepted request and a second fingerprint generated from tion ; the detected testable elements of the response by count detect testable elements in the input and output ; ing the number of occurrences of the detected testable generate a combined fingerprint for the input and elements in the HTTP request and response , wherein output based on the detected testable elements from fingerprints generated for input elements of the HTTP the input and output by counting the number of request and fingerprints generated for output elements occurrences of the detected testable elements in the of the HTTP response are concatenated ; input and output , wherein fingerprints generated for 20 adding , via a processor , the HTTP response to one cluster input elements of the input and fingerprints gener of the list of web page clusters based on similarity ated for output elements of the output are concat between the combined fingerprint and the one cluster ; enated ; and

add the output to one cluster of the list of web page testing , via a processor , a single web page from each clusters based on similarity between the combined 25 cluster of the list of web page clusters . fingerprint and the one cluster ; and 7 . The computer - implemented method of claim 6 , further

test a single web page from each cluster of the list of comprising the steps of : web page clusters . calculating , via the processor , a similarity score between

2 . The system of claim 1 , wherein the input comprises a the combined fingerprint for a web page from the hypertext transfer protocol ( HTTP ) request and the output 30 plurality of web pages and a cluster from the list of comprises a hypertext transfer protocol ( HTTP ) response . clusters ; and

3 . The system of claim 1 , wherein the test of the single adding the web page to the cluster in response to detecting web page from each cluster includes the processor being that the similarity score exceeds a similarity threshold . further configured to detect a security vulnerability based on 8 . The computer - implemented method of claim 6 , further the intercepted input ; and modify the web application to 35 comprising the step of calculating , via a processor , a simi prevent the detected security vulnerability in each web page larity score between the fingerprint and each cluster in the in each web page cluster . list of clusters based on a calculated hamming distance .

4 . The system of claim 1 , wherein the processor is 9 . The computer - implemented method of claim 6 , configured to calculate a similarity score between a finger - wherein the step of generating the combined fingerprint for print for a respective web page from the plurality of web 40 each web page comprises the steps of : generating a first pages and a respective cluster from the list of clusters and fingerprint representing each element in a GET request and add the respective web page to the respective cluster in a second fingerprint for each element of a document object response to detecting that the similarity score exceeds a model returned as a response to the GET request ; and similarity threshold . combining the first and second fingerprints for the elements

5 . The system of claim 1 , wherein the processor is 45 to generate the combined fingerprint for each web page . configured to calculate a similarity score between the fin - 10 . The computer - implemented method of claim 6 , fur gerprint and each cluster in the list of clusters based on a ther comprising steps of : sending the HTTP request to the calculated hamming distance . web application if the HTTP request would not result in a

6 . A computer - implemented method , carried out by one or web page that belongs to any cluster based on maximal more processors executing with code , comprising the steps 50 distances calculated for the list of clusters , and receiving an of : output from the web application in response to the HTTP

receiving a list of web page clusters , each of the web page request . clusters representing a grouping of a plurality of web 11 . A computer program product for testing web applica pages of a web application to be tested , where each web tions , the computer program product comprising a com page cluster includes an associated input maximal 55 puter - readable storage medium having program code distance score generated by calculating the similarity of embodied therewith , wherein the computer readable storage each of one or more web page requests that resulted in medium is not a transitory signal per se and the program a web page associated with a given web page cluster of code is executable by one or more processors to : the web page clusters , the maximal distance score is receive a list of web page clusters , each of the web page lower than other similarity scores in a cluster of the list ; 60 clusters representing a grouping of a plurality of web

receiving , via a processor , an intercepted hypertext trans pages of a web application to be tested , where each web fer protocol ( HTTP ) request to the web application page cluster includes an associated input maximal where the request is configured to cause the web distance score generated by calculating the similarity of application to provide a hypertext transfer protocol each of one or more web page requests that resulted in ( HTTP ) response from the web application ; 65 a web page associated with a given web page cluster of

obtaining a similarity score between the HTTP request the web page clusters , the maximal distance score is and one or more web page requests ; lower than other similarity scores in a cluster of the list ;

8 . The

Page 17: ( 12 ) United States Patent ( 10 ) Patent No . : US 10 , 346 , 291 B2 …ilanben/resources/US10346291.pdf · 2020-01-05 · US010346291B2 ( 12 ) United States Patent Ben - Bassat

17 US 10 , 346 , 291 B2

18 receive an intercepted input to the web application , where test a single web page from each cluster of the list of web

the input is configured to cause the web application to page clusters . provide a responsive web page as an output for the 12 . The computer program product of claim 11 , further input ; comprising program code executable by a processor to

obtain a similarity score between the input and one or 5 calculate a similarity score between a fingerprint for a web more web page requests ; page from the plurality of web pages and a cluster from the

cause the input to be sent to the web application as a list of clusters and to add the web page to the cluster in request input where the similarity score between the response to detecting that the similarity score exceeds a input and any one of the one or more web page requests similarity threshold .

10 is less than the maximal distance score of the given web 13 . The computer program product of claim 11 , further cluster ; comprising program code executable by a processor to

receive the output for the input from the web application ; calculate a similarity score between the fingerprint for each detect testable elements in the input and output ; web page and each cluster in the list of clusters based on a generate a combined fingerprint for each web page based calculated hamming distance . on a first fingerprint of the detected testable elements 15 14 . The computer program product of claim 11 , further from the intercepted input and the second fingerprint of comprising program code , executable by a processor to : the detected testable element from the output by count generate a maximal distance between requests for each

cluster in the list of clusters ; ing the number of occurrences of the detected testable elements in the input and output , wherein fingerprints receive a request to be sent to the web application ; generated for input elements of the input and finger - 20 detect that the request would not result in a web page that prints generated for output elements of the output are belongs to any cluster based on the maximal distances concatenated ; for the clusters ; and

add the output to one cluster of the list of web page send the request in response to detection that the request clusters comprising one or more similar web pages would not result in the web page that belongs to any based on similarity between the combined fingerprint cluster . and the one cluster ; and


Recommended