Detection and Localization of HTML Presentation Failures...

Post on 22-Jul-2020

0 views 0 download

transcript

Detection and Localization of HTML

Presentation Failures Using Computer

Vision-Based Techniques

Sonal Mahajan and William G. J. Halfond

Department of Computer Science

University of Southern California

Presentation of a Website

• What do we mean by presentation? – “Look and feel” of the website in a browser

• What is a presentation failure? – Web page rendering ≠ expected appearance

• Why is it important? – It takes users only 50 ms to form opinion about

your website (Google research – 2012)

– Affects impressions of trustworthiness, usability, company branding, and perceived quality

2

End user – no penalty to move to another website

Business – loses out on valuable customers

Motivation

• Manual detection is difficult

– Complex interaction between HTML, CSS,

and Javascript

– Hundreds of HTML elements + CSS properties

– Labor intensive and error-prone

• Our approach – Automate debugging of

presentation failures

3

Two Key Insights

1. Detect presentation failures

4

Oracle image

Test web page

Presentation

failures Visual comparison

Use computer vision techniques

Two Key Insights

2. Localize to faulty HTML elements

5

Test web page Faulty HTML

elements Layout tree

Use rendering maps

• Regression Debugging – Current version of the web app is modified

• Correct bug

• Refactor HTML (e.g., convert <table> layout to <div> layout)

– DOM comparison techniques (XBT) not useful, if DOM has changed significantly

• Mockup-driven development – Front-end developers convert high-fidelity mockups to HTML

pages

– DOM comparison techniques cannot be used, since there is no existing DOM

– Invariants specification techniques (Selenium, Cucumber, Sikuli) not practical, since all correctness properties need to be specified

– Fighting layout bugs: app independent correctness checker 6

Limitations of Existing Techniques

Running Example

Web page rendering Expected appearance (oracle) ≠ 7

Our Approach

8 P1. Detection P2. Localization

Oracle image

Test web page

Visual

differences

Pixel-HTML mapping

Report

Goal – Automatically detect and localize

presentation failures in web pages

P1. Detection

9

• Find visual differences (presentation failures)

• Compare oracle image and test page

screenshot

• Simple approach: strict pixel-to-pixel

equivalence comparison

– Drawbacks

• Spurious differences due to difference in platform

• Small differences may be “OK”

Perceptual Image Differencing (PID)

• Uses models of the human visual system

– Spatial sensitivity

– Luminance sensitivity

– Color sensitivity

• Configurable parameters

– Δ : Threshold value for perceptible difference

– F : Field of view of the observer

– L : Brightness of the display

– C : Sensitivity to colors 10

Shows only human perceptible differences

Filter differences belonging to dynamic areas

P1. Detection – Example

11 Test web page screenshot Oracle Visual comparison using PID Apply clustering (DBSCAN)

A

B

C

P2. Localization

• Identify the faulty HTML element

• Use R-tree to map pixel visual differences

to HTML elements

• “R”ectangle-tree: height-balanced tree,

popular to store multidimensional data

Use rendering maps to find faulty HTML

elements corresponding to visual differences

12

13

P2. Localization - Example

R1

Sub-tree of R-tree 14

P2. Localization - Example

R2

R3

R4

R5

(100, 400)

Map pixel visual differences to HTML elements 15

P2. Localization - Example

R2

R1

tr[2]

R3

td

table

tr

td

Result Set:

/html/body/…/tr[2]

/html/body/…/tr[2]/td[1]

/html/body/…/tr[2]/td[1]/table[1]

/html/body/…/tr[2]/td[1]/table[1]/tr[1]

/html/body/…/tr[2]/td[1]/table[1]/td[1]

Special Regions Handling

• Special regions = Dynamic portions (actual content

not known)

16

1. Exclusion Region

2. Dynamic Text Region

1. Exclusion Regions

• Only apply size bounding property

17

Difference

pixels filtered

in detection

Advertisement

box <element>

reported as

faulty

Test web page Oracle

2. Dynamic Text Regions

• Style properties of text known

18

Text color: red

Font-size: 12px

Font-weight: bold

Test web page Modified test web page

(Oracle) Run P1, P2

News box

<element>

reported as

faulty

P3. Result Set Processing

• Rank the HTML elements in the order of likelihood of being faulty

• Weighted prioritization score

• Lower the score, higher the likelihood of being faulty

Use heuristics based on element relationships

19

3.1 Contained Elements (C)

20

parent

child1 child2 child1 child2

parent

Expected appearance Actual appearance

✖ ✖

3.2 Overlapped Elements (O)

21

parent

child1 child2 child1 child2

parent

Expected appearance Actual appearance

3.3 Cascading (D)

22

element

1

Expected appearance Actual appearance

element

2 element

3

element

1

element

2

element

3

3.4 Pixels Ratio (P)

23

parent

child

Child pixels ratio = 100%

Parent pixels ratio = 20%

/html

/html/body

/html/body/table

.

.

.

/html/body/table/…/img

1. /html/body/table/…/img

.

.

.

5. /html/body/table

6. /html/body

7. /html

P3. Result Set Processing - Example

24

A

B

C

D

E

Report

Cluster B

Cluster C

Cluster D

Cluster E

/html

/html/body

/html/body/table

.

.

.

/html/body/table/…/img

Cluster A

Empirical Evaluation

• RQ1: What is the accuracy of our approach for detecting and localizing presentation failures?

• RQ2: What is the quality of the localization results?

• RQ3: How long does it take to detect and localize presentation failures with our approach?

25

Experimental Protocol

• Approach implemented in “WebSee”

• Five real-world subject applications

• For each subject application

– Download page and take screenshot, use as

the oracle

– Seed a unique presentation failure to create a

variant

– Run WebSee on oracle and variant

26

Subject Applications

27

Subject Application Size (Total HTML

Elements) Generated # test

cases

Gmail 72 52

USC CS Research 322 59

Craigslist 1,100 53

Virgin America 998 39

Java Tutorial 159 50

RQ1: What is the accuracy?

28

93%

92%

92%

90%

97%

94%

Gmail

USC CS Research

Craigslist

Virgin America

Java Tutorial

Localization accuracy

• Detection accuracy: Sanity check for PID

• Localization accuracy: % of test cases in

which the expected faulty element was

reported in the result set

RQ2: What is the quality of localization?

29

12 (16%)

17 (5%)

32 (3%)

49 (5%)

8 (5%)

Gmail

USC CS…

Craigslist

Virgin America

Java Tutorial

Result Set Size

23 (10%)

1. <>…</>

2. <>…</>

.

.

.

.

.

.

23. <>…</>

Rank = 4.8 (2%) ✖ ✔ Distance = 6

faulty element not present

RQ3: What is the running time?

21%

25%

54%

P1: Detection

P2: Localization

P3: Result SetProcessing

30

7 sec

3 min

87 sec

Sub-image search for cascading heuristic

Comparison with User Study

76%

36%

100% 93%

Detection Localization

Accuracy

Students WebSee• Graduate-level students

• Manual detection and

localization using Firebug

• Time

– Students: 7 min

– WebSee: 87 sec

31

Case Study with Real Mockups

• Three subject applications

• 45% of the faulty elements reported in

top five

• 70% reported in top 10

• Analysis time similar

32

Summary

• Technique for automatically detecting and

localizing presentation failures

• Use computer vision techniques for detection

• Use rendering maps for localization

• Empirical evaluation shows positive results

33

Thank you

34

Detection and Localization of HTML Presentation Failures Using Computer

Vision-Based Techniques

Sonal Mahajan and William G. J. Halfond

spmahaja@usc.edu

halfond@usc.edu

Normalization Process

• Pre-processing step before detection

1. Browser window size is adjusted based

on the oracle

2. Zoom level is adjusted

3. Scrolling is taken care of

35

Difference with XBT

• XBT use DOM comparison – Find matched nodes, compare them

• Regression debugging – Correct bug, refactor HTML (e.g. <table> to <div> layout)

– DOM significantly changed • XBT cannot find matching DOM nodes, not accurate

comparison

• Mockup Driven Development – No “golden” version of page (DOM) exists

– XBT techniques cannot be used

• Our approach – Uses computer vision techniques for detection

– Applies to both scenarios

36

Pixel-to-pixel Comparison

37

Oracle Test Webpage Screenshot

Pixel-to-pixel Comparison

38 Difference image

98% of the entire image is shown in difference!

Difference pixel

Matched pixel

P1. Detection

39

Analyze using computer vision

techniques

• Our approach: Perceptual image

differencing (PID)

• Find visual differences (presentation failures)

• Simple approach: strict pixel-to-pixel

equivalence comparison

Perceptual Image Differencing

40 Difference image

Difference pixel

Matched pixel

41

High fidelity mockups… reasonable?

42

High fidelity mockups… reasonable?

43

High fidelity mockups… reasonable?

44

High fidelity mockups… reasonable?

45

High fidelity mockups… reasonable?

46

High fidelity mockups… reasonable?

47

High fidelity mockups… reasonable?

48

High fidelity mockups… reasonable?

49

High fidelity mockups… reasonable?

50

High fidelity mockups… reasonable?

51

High fidelity mockups… reasonable?

52

High fidelity mockups… reasonable?