Date post: | 15-Jan-2016 |
Category: |
Documents |
View: | 219 times |
Download: | 0 times |
Fast, Accurate Creation of Data Validation Formats by
End-User Developers
Christopher ScaffidiBrad Myers, Mary Shaw
Carnegie Mellon University
22
Contextual inquiry:Contextual inquiry:What challenges do end users face?What challenges do end users face?
Observed 3 administrative assistants, 4 managers, and 3 webmasters/graphic designers (1-3 hrs, each)
Background Toped Evaluation New Opportunities
33
One person’s task: validate web forms--One person’s task: validate web forms--but he didn’t know JavaScript / regexpsbut he didn’t know JavaScript / regexps
Is the input valid?“EDSH 225”
Is the input questionable?“GATE 225”
Or is it obviously invalid?“412-555-5444”
Background Toped Evaluation New Opportunities33
44
Hurricane Katrina “Person Locator” site:Hurricane Katrina “Person Locator” site:Many inputs unvalidatedMany inputs unvalidated
Background Toped Evaluation New Opportunities44
55
Spreadsheets contain lots of typos:Spreadsheets contain lots of typos:inconsistent formatting & invalid stringsinconsistent formatting & invalid strings
• Above: part of an actual spreadsheet on our university’s web site• Plenty of invalid strings in users’ spreadsheets during contextual inquiry• For thousands of other examples: EUSES Spreadsheet Corpus
Background Toped Evaluation New Opportunities
66
Needed: a usable mechanism for Needed: a usable mechanism for implementing validationimplementing validation
66 Background Toped Evaluation New Opportunities
77
Coming Up…Coming Up…
• Background– Formative pilot study– Related work
• Toped
• Evaluations– Usability– Expressiveness
• New opportunities
88
Formative pilot studyFormative pilot study
• Motivation: Exploring the “gulf of execution” for data– User has to figure out how to map intentions to the
features provided by a computer system– Poor “closeness of mapping” impedes system use Before designing system, probe the concepts and
terminology familiar to users
• Asked 4 administrative assistants to verbally describe two kinds of data– American mailing addresses– University project numbers
Background Toped Evaluation New Opportunities
99
Formative pilot studyFormative pilot study
• Participants identified and named the parts of data• Eg: Street address, city, state, zip code
– They hierarchically refined parts until sub-parts became small enough that they lacked names
• At that point, they described parts with constraints– Constraints were sometimes “soft”: not always true– They used adverbs of frequency to indicate softness
• Eg: “usually” or “sometimes”
• Implications– Users describe data in terms of constrained parts– Valid data sometimes violate certain constraints
Background Toped Evaluation New Opportunities
1010
Alternate approaches: limited support for Alternate approaches: limited support for expressing constraints on structured stringsexpressing constraints on structured strings
• Grammars based on sequences of characters– Context-free grammars (CFGs)
• Grammex• Apple data detectors (CFGs + regexps)
– Regular expressions (regexps)• SWYN regexp editor
• Lapis patterns: constrained structured strings– Intentionally designed to support outlier finding
@PhoneNumber is Number equal to /\d\d\d/ then "-" then Number equal to /\d\d\d\d/ ignoring nothing
Background Toped Evaluation New Opportunities
1111
1. Name
2. Describe
3. Test
4. Save
1111 Background Toped Evaluation New Opportunities
Toped: A form fill-in UI to Toped: A form fill-in UI to mediatemediatebetween users and grammarsbetween users and grammars
1212
The system generates an augmented CFG The system generates an augmented CFG from format descriptionfrom format description
A part that almost always has 1-8 lowercase letters:
#WORD : #CHLIST : COUNT(#CH)>=1 && COUNT(#CH)<=8 {90}#CHLIST : #CH | #CH #CHLIST #CH : a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
• More compact than a pure CFG• More expressive than a pure CFG
– Some constraints are impossible to represent as CFG– Some constraints need to be soft
Background Toped Evaluation New Opportunities
1313
Testing strings against grammarsTesting strings against grammars
• Downgrade a parse if it violates constraints– Penalty = 1 – (strength of constraint)/100– Multiply penalties– Propagate penalties up parse tree– Choose best parse (ie: parse with least penalties)
• Show error messages– Track violated constraints, concatenate into message
• If parse fails completely, show portions of format description that were used to generate unsatisfied CFG productions.
– End-user development tools may offer user option of overriding some errors, depending on penalties.
Background Toped Evaluation New Opportunities
1414
Showing error messages after testing Showing error messages after testing strings against the generated CFGsstrings against the generated CFGs
1414Background Toped Evaluation New Opportunities
1515
Usability: Does Toped help users to Usability: Does Toped help users to implement string validation?implement string validation?
• Between-subjects lab experiment– Direct comparison system: Lapis– (We also compare results to those of SWYN study – see paper)
• Recruited 17 participants (9 Toped, 8 Lapis)– Approx half were administrative assistants, approx
half were master’s students (mostly information systems), distributed roughly equally across tools
– 1 participant mis-interpreted instructions (=> 8 & 8)
Background Toped Evaluation New Opportunities
1616
Usability: Does Toped help users to Usability: Does Toped help users to implement string validation?implement string validation?
• Study structure– Background questionnaire– Tutorial (30 min)– 3 tasks (20 min)– User satisfaction questionnaire
• Detail of a task:– Validate 1 kind of data
• phone numbers, mailing addresses, company names– User goal: For each kind, find typos in 25 strings
• Randomly drawn from EUSES spreadsheet corpus• And we also retained 25 strings for further accuracy tests
Background Toped Evaluation New Opportunities
1717
Usability: Users were nearly 2 times as fast Usability: Users were nearly 2 times as fast and found 3 times as many typosand found 3 times as many typos
Toped Lapis RelativeImprovement
Significant?(Mann-Whitney)
Tasks completed 2.79 1.75 60% p<0.01
Typos identified
On 75 visible strings 16.50 5.75 187% p<0.01
On all 150 strings 31.25 9.50 229% p<0.01
F1 accuracy measure
On 75 visible strings 0.74 0.51 45% No
On all 150 strings 0.68 0.46 48% No
User satisfaction 3.78 3.06 24% p=0.02
Toped also compares favorably to SWYN regexp editor – see paper
Background Toped Evaluation New Opportunities
1818
Expressiveness: Does Toped provide Expressiveness: Does Toped provide adequate primitives for validating real data?adequate primitives for validating real data?
• Logged data typed by 4 users into browser (3 weeks)– For each text string, we recorded:
• A label for the text field (e.g.: “Phone”)• A regexp summarizing the string (e.g.: \d\d\d-\d\d\d-\d\d\d\d)
• Examined data, wrote scripts to cluster strings– 94% of the 5897 strings were in 19 clusters– Each cluster had 1-2 formats
• Used Toped to create formats– Omitted 5 clusters that were for “general text”, usernames or
passwords (so we could post format descriptions online)
Background Toped Evaluation New Opportunities
1919
Expressiveness: Does Toped provide Expressiveness: Does Toped provide adequate primitives for validating real data?adequate primitives for validating real data?
• Overall, successful– We were able to create formats for each kind of data– The formats identified many probable typos
• Ideas for improvements– Ways to reuse constraints from format to format– Primitives for kinds of parts: Numeric, word-like, …
Background Toped Evaluation New Opportunities
2020
Data Description EditorData Description EditorTopedToped++: an improved editor: an improved editor
2020Background Toped Evaluation New Opportunities
2121
Contributions and New OpportunitiesContributions and New Opportunities
• Toped – UI to mediate between users & grammars– Enables users to work faster & more effectively– Adequately expressive for validating many kinds of data– Provided a start for new line of similar editor tools
• New Opportunities (aka “Future Work”)– Extending Toped+ to automatically reformat data [IUI’09]– Providing a repository for sharing formats (in-progress)
– Developing new ways to make use of ability to identify strings that violate soft constraints
Background Toped Evaluation New Opportunities
2222
Thank You…Thank You…
• To Margaret Burnett, Brad Myers, Valentina Grigoreanu, Mary Beth Rosson, Mary Shaw and others in the EUSES Consortium for feedback over the years
• To NSF for funding
• To ISEUD 2009 for this opportunity to present
2323
TopedToped++: key improvements vs Toped: key improvements vs Topedin terms of Cognitive Dimensionsin terms of Cognitive Dimensions
• Better closeness of mapping– Constraints “belong” to parts in all formats
• Higher juxtaposability– Easy to view & compare multiple formats
• Lower error-proneness– Helps prevent senseless combinations of constraints
• Lower viscosity– Drag-and-drop / copy-and-paste speeds up edits
• Improved progressive evaluation– User can test each part individually
Background Toped Evaluation New Opportunities