Supporting End Users In The Creation Of Dependable Web Clips

Post on 28-Jan-2015

103 views 0 download

Tags:

description

 

transcript

1

Supporting End Users in the Creation of Dependable Web Clips

Sandeep Lingam, Sebastian Elbaum

Proceedings of the 16th international conference on World Wide Web (WWW2007)

Reporter: Shih-Feng Yang

2007/7/2

2

Outline

Introduction Web Clipper Evaluation Conclusion

3

Introduction

Web authoring environments have enabled end-users who are non-programmers to design and quickly construct web pages.

Web clip : a component within the end-user’s website which can dynamically extract information from other web-sources.

4

Introduction

Web Clip

5

Introduction

Goal Web clipper : An approach to support end-users

through the entire process of creating a dependable web clip.

Three fundamental aspects:1. Our tool will be embedded in the web authoring tool

interface.

2. Training: increase the robustness of the web clip.

3. Deploy multiple filters to increase the confidence in the correctness of the retrieved information.

6

Introduction

Challenges We can’t expect end-users to have any

programming experience about web clip. The content within the target site of a web clip

will change.

7

Web Clipper

Approach Overview

8

Web Clipper-Clipping Target Clip Selection

There is a custom browser for controlling the web clip. Every extractable document element is highlighted when

the user moves the mouse, and the user can make a selection by clicking on it.

Extraction Pattern Once a selection is made, an extraction pattern is

generated. During the clipping process, the user’s selection is uniquely

identified by its HTML-Path. HTML-Path : a specialized XPATH expression.

9

Web Clipper-Clipping

10

Web Clipper-Training To increase the robustness of the web clip, they con

struct extraction patterns which uniquely characterize the end-user selection.

Several clips will created using different extraction patterns.

Every time the user marks a clipping as valid, the system generates a filter corresponding to the clipping. Filter: Javascript code, embedded within the user’s web pa

ge.

11

Web Clipper-Training

Validation of the extraction patterns presented by the system.

12

Web Clipper-Training

Extraction Patterns

13

Web Clipper-Training

14

Web Clipper-Deployment The URL and extraction patterns of the clipped

content are used to generate an AJAX script. HTML documents -> XHTML. Relative URLs -> absolute URLs. Generate filters from pre-defined templates for each

of the extraction patterns during training. The user can move, resize or annotate the web clip

to suit her preference.

15

Web Clipper-Filtering and Assessment

The content which the user want to see in the web clip

16

Web Clipper-Filtering and Assessment

17

Web Clipper-Filtering and Assessment

18

Web Clipper-Filtering and Assessment

Then the paper defined Confidence The ratio of the maximum filter score of all valid

extraction patterns generated during the training section.

The prototype will alert the user when the content within the target site changes.

The user can also configure the web clips to provide alerts when the confidence scores fall below a particular threshold.

19

Web Clipper-Filtering and Assessment

Label filter has the highest score, soThe system will use this pattern to extract content, andthe confidence score = 2/3 = 67%

20

Web Clipper-Filtering and Assessment

Alert the user when the content within the target site changes

21

Evaluation

Effectiveness of the extraction patterns used in generating web clips.

Dependability of web clips in providing sufficiently correct information over time.

Robustness of web clips to changes in the clipped web site.

22

Evaluation Effectiveness of extraction patterns

23

Evaluation Dependability of web clips

confidence scores

24

Evaluation Robustness

This experiment will test the degree to which the web clips change:

1. Block Insertion

2. Block Movement

3. Block Deletion

4. Enclosing Element Changes

5. Target Clipping Removed

25

Evaluation Robustness

26

Conclusion

This paper presented an approach to support end-users through the entire process of creating a dependable web clip.

Web clipper addresses the shortcomings of existing tools by introducing the notion of training and of dynamic confidence evaluation.

27

Finish

Thanks for your patience!