Dynamic Detection of Novice vs. Skilled Use Without a Task Model CHI 2007 Proceedings Amy Hurst,...

Dynamic Detection of Novice vs. Skilled Use Without a Task Model

CHI 2007 Proceedings

Amy Hurst, Scott E. Hudson, Jennifer Mankoff

Carnegie Mellon University

Motivation

Create Intelligent User Interfaces Main Idea: If applications could detect a

user’s expertise, software could automatically adapt to better match expertise.

Uses

Support adaptive interface with one more useful piece of information – novice or skilled user.

Provide tailored intelligent help: descriptive vs. brief depending on user skill.

Automatically generate data only when a user is likely to need them.

Challenges in Skill Detection

Has to be application independent Done dynamically, continuously, and

unobtrusively

Approach to Dynamic Detection Using the following:

A set of features that quantify interaction such as mouse motion.

A set of training data containing examples of novice and skilled use metrics.

It is possible to determine Which features are most predictive of skilled use. Train a classifier so that given unlabeled test data,

returns and indication of novice or skilled behavior.

Difference in Expertise - Qualitative Knowledge, speed, and comfort Experts tend to use domain knowledge in the

head, or recall to achieve their goal Novices rely on knowledge in the world, or

recognition. These differences manifest themselves in

measurable differences in user action.

Difference in Expertise - Quantitative Use of menus

Selecting the correct menu item. Expert typically recall the location, or use keyboard shortcuts.

Size and organization of menu affects selection time.

Skilled users memorize the location. Novices typically do not know what menu item to

search and where they may be located.

Difference in Expertise - Quantitative Skilled Performance Modeling

Fitts’ Law and Steering Law – both indicate that distance and size of target affect the speed of selection.

These were not directly used, but the underlying properties of mouse motion, velocity and acceleration, are useful.

Keystroke Level Modeling

Difference in Expertise - Quantitative Mouse vs. Keyboard and other data

Monitoring on-screen dialogs, help browsers, and keyboard logs

Possible Keyboard actions: Detect actions and immediate undo Detect use of keyboard shortcuts

These were not used as these occur less frequently compared to mouse-based features.

Creating a Classifier – predictive model Direct engineering of best set of features a priori vs.

using a large set of plausible features and use machine learning based techniques to determine which features were more predictive when used in a statistical model.

Allows for speculatively trying a range of features – many may turn out to be useless, while some may prove useful.

To use this approach a large data set of labeled training data is needed

Classifier – Data Collection Modified GIMP to streamline tasks

Removed menu bars, toolboxes, ‘close’ and ‘quit’ All tasks accomplished through popup menus Mouse events logged via XNEE that received

data directly from X11 windowing system. GTK used to log menu interactions (mouse

enter/exit menu item) Carefully avoided any information that could not

be gathered in an application independent fashion – specific height and width of menu items

Classifier – Data Collection

Detecting Informative Moments Actions readily isolated, indicative of a

phenomena that can be easily and accurately labeled

Menu selection: starting with a right click to open a pop-up menu, and ending with the left click to select a menu item or dismiss the menu without a selection

Classifier – Data Collection Participants (paid)

Short questionnaire To verify novice status To determine experience with image editing and drawing

manipulation applications Reading test – 153 word passage

Participant Status Did not know location of most menu items All but four knew to select ‘undo’ from ‘edit’ menu, so this was

removed from analysis. Reading speed were above adult average: 230-612 wpm Mostly Windows users

Classifier – Data Collection Method

Tasks designed to be repetitive and progress from novice to skilled behavior

Clear, specific sequential instructions on paper Two separate tasks in fixed order

1. Draw transparent shapes and change background pattern

2. Draw letters and shapes and color them with solid colors or gradients

Each task – seven identical trials Each trial had ten menu selections Extreme outliers removed from training data set

Difficulty staying on task, skipping sections of trials, technical failures

Classifier – Data Collection Labeling and Validating Novice vs. Skilled

Behavior First task of first trial labeled as novice Final trial in both tasks labeled as skilled Menu search samples labeled as novice – 600 Menu search samples labeled as skilled – 700

Subjective users impression of their performance after each trial

Objective Compared performance times of menu selections within

trials with predictions by KLM.


Plot of average of participant’s subjective responses to questions asked after each trial: Task #1 "I had no problem locating the menu items in this trial" Task #2 "It was easy for me to complete this trial without external help."

Subjective impression

Classifier – Data Collection Objective Analysis of data

Compared performance times of menu selections within trials with predictions by KLM.

Analysis divided into groups defined by submenu depth(Expertise develops quicker in higher level menus)

Analysis showed that users progressed through a learning curve.

Visiting second level submenus, users were performing better than KLM predicted times by the fourth trial in first task

Users reached KLM predictions for third level sub menus by end of second task

Analyzed variation across trials of the most promising feature:the ratio of time to make a menu selection vs. the depth of the selection


Plot of a promising menu feature’s mean for each trial number. Note the rise in the learning curve between the first and second tasks.


Candidate Features Features derived from low-level motion

Total Time (seconds) Elapsed time within the action (starting when the menu opened and ending when it closed). Range: 0.504 – 143

X and Y Mouse Velocity (pixels/second) Average velocity of the mouse during a menu operation in the X and Y directions. (Range: X: 24756 – 35745; Y: 30116 – 37789)

X and Y Mouse Acceleration (change in velocity/second) Average unsigned acceleration of the mouse during a menu operation in the X and Y directions. (Range: X:0 – 242041107; Y: 0 – 1770018051.8)

Dwell Time (seconds) Time spent dwelling (not moving) during the interaction sequence. (Range: 0 – 112)


Candidate Features Features related to the interaction technique

Average Dwell Time (seconds/count) Time spent dwelling divided by the number of menu items visited. (Range: 0 - 3.581)

Number of Opened Submenus (count) Total number of submenus that the user opened while searching. (Range: 0 - 59)

Selection Depth (count) Depth of the selection (Range: 0 - 3) used in combination, conditionally with other features.

Menu Item Visits (count) Total number of menu items that were visited or passed through during menu action. (Range: 0 - 160)

Unique Item Visits (count) Number of unique menu items visited. (Range: 1 - 57)

Selected Item Dwell Time (seconds) Time spent dwelling within the menu item that was ultimately selected. This feature sums all times spent in that item. (Range: 0 - 22)


Candidate Features Features related to performance models

KLM Diff (seconds) Difference between KLM predicted time and actual time for the action. (Range: 0.54 - 143.196)

KLM Ratio (dimensionless) KLM predicted time divided by the actual time for the action. (Range: 0.003 - 3.488)

Time Depth Ratio (seconds/depth) Time to make a menu selection divided by the depth of that selection. (Range: 0 - 1.368)


Feature Selection Used analysis of information gain to rank content of each

feature in isolation Top 10 features:

Average Y Acceleration KLM Diff Time Depth Ratio KLM Ratio Total Time Dwell Time Average Dwell Time Selected Item Dwell Time Menu Item Visits Number of Opened Submenus

Classifier – Data Collection Building and validation of Classifier

C4.5 Decision Tree learning algorithm implemented on a WEKA machine learning environment

Other learning algorithms considered Bayesian Networks Naïve Bayes, Support Vector Machines, and Linear Discriminant Analysis

Testing the Classifier Used traditional 10 fold cross-validation test Hold out 10% of test data. Classifier built with 90% of data and test

classifier accuracy in predicting the 10% held-out set. Ten trials performed with 10 disjoint hold-out sets

The classifier achieved 91% accuracy.

Classifier – Data Collection Trends that the Classifier looked for to make classification

Novice behavior Low average Y acceleration (mouse moved slowly, stopped, changed

direction) Longer time to make the selection of a given submenu depth Large total number of menu selections and unique menu selections

Skilled behavior High average Y acceleration Faster navigation of deeper menu items Low total number of menu selections

Implementation – Closing the loopPrototype Application to Adapt to Expertise

• Functions across GTK applications

• Displays name and expertise-tailored descriptions

Implementation – Closing the loopValidating Ability to Detect Expertise Used 4 paid participants Used the modified GIMP application Study consisted of two tasks

1. Scripted task to familiarize with application.

2. Free-form task to draw a scene.

Each used different strategies in the free-form tasks 1 and 2 mostly used menu items from scripted tasks 3 and 4 explored menus first 4 had difficulties with scripted task – stayed novice

Implementation – Closing the loop

Moving average of live classifier predictions for repetitive and free-form tasks. The vertical bars indicate the transition between tasks.

Conclusion and Future Work

Skill differences are often ignored by applications

This can be easily and accurately detected and used to better adapt to user needs

Future work will include Validation of the technique across multiple

applications, and operating systems. Explore performance in a wider range of real

world situations.

Questions and Comments

?

Date post:	18-Dec-2015
Category:	Documents
Upload:	gordon-holmes
View:	213 times
Download:	0 times

Dynamic Detection of Novice vs. Skilled Use Without a Task Model CHI 2007 Proceedings Amy Hurst,...

Documents