Date post: | 28-Dec-2015 |
Category: |
Documents |
Upload: | melissa-caldwell |
View: | 216 times |
Download: | 0 times |
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Beyond Hard-to-Reach Pages: Interactive, Parametric Web
Macros
Alex SafonovUniversity of Minnesota
Department of Computer Science and Engineering
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Talk Outline
• Problem: user interaction with the WWW becoming more tedious
• Solution: personal Web Automation
• Challenges of Web Automation
• Lessons learned from the WebMacros system
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Take-away Messages
• General Web users– There is an effort to simplify repetitive tasks by
automation
• Web usability specialists– Personal Web Automation as a means to improve site
usability
• Content providers– Awareness of Web Automation scripts vs. data-
scraping bots
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Hard-to-Reach Pages: Examples
• Airline/hotel/car reservations
• Searches over library and citation databases
• Populating e-commerce shopping carts
• Map and weather queries
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Patterns of a Web Task
• Identify oneself (optional)– Implicitly (cookies) or explicitly (login)
• Select the appropriate service• Specify query parameters and execute query
– HTML forms on one or several pages
• Review/iterate over returned items– E.g., save or print each paper matched in ACM DL– Returned items may span multiple pages
• Repeat query with different parameters (optional)
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Sharing Web Interactions
• Scenario 1– Instructor populates an online bookstore
shopping cart with course textbooks; she would like students to instantly access the cart
• Scenario 2– System administrator performs a Web-based
administration task; wants to make the task available to colleagues
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Existing Tools
• Bookmarks/favorites and histories– Links only – not procedures
• Server-based mechanisms– comparison shopping services; auction proxies;
special URLs for bookmarking;– limited flexibility: user is not in control
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Motivation: Automate Repetitive Tasks
• Tasks such as checking airline pricing or changing system configuration are often performed many times– With the same or different parameters
• Goal: relieve the user from doing repetitive tasks by using automation
• Approach: capture and reuse user interactions with the Web
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Related Work
• Macro scripting and Programming By Demonstration (PBD)
• Web Automation (LiveAgent, WebVCR, WebMacros)
• Hypermedia trails and tours
• Web Semantics and Web Services
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Requirements for a Personal Web Automation system
• Users create Web automation scripts by demonstration, in a familiar environment
• Web Automation systems handle dynamic data and semantic-free markup
• Running scripts have reasonable side effects
• Privacy is maintained when sharing scripts
• Scripts support parameters
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
WebMacros – a Personal Web Automation system
• First prototype - HFWeb 99– Records and replays a linear sequence of
navigation steps (opened URLs, followed links, and form submissions)
• Users create Web Automation scripts (Web macros) through normal Web navigation and form filling
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Recording Web macros
• Click on – Browser opens a new window
• Open the Avis home page• Fill out the forms, navigate pages• Supply macro name
and description andclick on“Finish Recording”
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Running Web macros
Macro playback control panel
A directory of user’s macros
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Macros with Parameters
• During recording, user can mark form inputs as parameters
• During playback, user specifies current values
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Batch and Interactive Playback
• Batch playback– Browser loads the final page of a macro
• Why support interactive playback?– First use of a macro– Easier to substitute parameters– Can “skip to end”
• WebMacros substitutes recorded parameter values (except private ones) into the page
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Dealing with Dynamic Content
• A Web automation system works in an unreliable, dynamic environment
• Page retrieved at macro playback may be not what the user expects– Services may be unavailable– Verbatim replay of recorded steps may be
inappropriate• Session ids• Expired or missing cookies
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Dealing with Dynamic Content
• WebMacros uses rules to match recorded steps against retrieved pages during replay
• How can the system determine that an incorrect page was retrieved?
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Same Template, Different Content
• Dynamically generated pages may have different content but similar HTML markup
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Structure-based Page Verification
• WebMacros efficiently compares HTML parse trees of recorded and retrieved pages
• An HTML parse tree of a page is compactly represented as a set of path expressions
• Similarity measure suited to template-generated pages with different numbers of items
• If similarity between structure of recorded and actual page below threshold, WebMacros alerts user
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Sharing of WebMacros Scripts
• Cookie context can be encapsulated with macros– Allows to play macros from any computer– Allows to share macros among users
• Course textbook shopping cart– No instructor’s or student’s cookies
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
WebMacros Architecture
• Advantages of a pure proxy architecture – Any HTTP client works; does not need a built-in JVM
– Proxy design enables remote use and sharing of macros
– Proxy does not depend on the browser for page retrieval
– Proxy does not need “security clearance” to read/write local files and modify incoming pages
• Drawbacks– User must trust the macro server if macros are stored on it
– No access to browser-generated HTML
– Non-local proxy generates extra HTTP traffic
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
System Implementation
• Approach: HTML rewriting– During recording, WebMacros modifies URLs of links, images,
forms, and frames on the retrieved page
– These are rewritten to special URLs intercepted by the proxy
– Form fields are annotated with parameter selection radioboxes
• Macro Representation– Macros are stored in a relational database
– Originally, macro steps stored as WebL scripts – difficult to manipulate
– WebL scripts generated and executed for each step
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Lessons Learned
• Hybrid architecture for recording and playback– Difficult to detect user actions from a proxy– Optimal: client-based (applet) recorder, proxy
playback component
• XML representation for macros– Lightweight, fast parsers now available– Not tied to a relational DBMS
Beyond Hard-to-Reach Pages: Interactive, Parametric Web Macros 7th Conference onHuman Factors and the Web
Universal Access:More People. More Situations
Further Work
• Pilot user study is under way– improved recording and playback controls;
added the Undo feature for recording
• Detect iteration during macro demonstration– Approach: user demonstrates some example
(e.g., links), WebMacros generalizes to similar links and merges results
• Propose HTML extensions/XML DTD to make Web Automation more reliable