+ All Categories
Home > Documents > The Universal Speech Interface (USI) PDG Progress Report Thomas Harris, Stefanie Tomko, Arthur Toth,...

The Universal Speech Interface (USI) PDG Progress Report Thomas Harris, Stefanie Tomko, Arthur Toth,...

Date post: 19-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
43
The Universal Speech Interface (USI) PDG Progress Report Thomas Harris, Stefanie Tomko, Arthur Toth, James Sanders, Alex Rudnicky, Roni Rosenfeld School of Computer Science Carnegie Mellon University 4 June 2003
Transcript

The Universal Speech Interface (USI) PDG Progress Report

Thomas Harris, Stefanie Tomko, Arthur Toth, James Sanders,

Alex Rudnicky, Roni Rosenfeld

School of Computer Science

Carnegie Mellon University

4 June 2003

Outline

• USI Project Summary• USI Device Control• USI User Studies• Tech Transfer Initiative

– USI Application Generator

Program Goals and Plan

• Overall program goal: – Design a universal (i.e. device-independent)

interface for speech-based interaction with wearable and home devices

• Program plan & milestones:– Q1: analysis, interaction principles– Q2: build device-simulation environment– Q3: build first device prototype– Q4: initial user studies; development tools

Program Deliverables

• A novel universal design for speech-based interaction with wearable- and home-devices

• At least one demonstration system exemplifying the new interface

• A set of tools for rapid prototyping of compliant applications

The Universal Speech Interface (USI)In a Nutshell

• Unifying approach to human-machine speech communication

• Unified “look and feel” across all applications– analogous to the Xerox/Macintosh/Windows GUI

look-and-feel

• Stylized, semi-natural interaction– analogous to the “Graffiti” alphabet for the Palm

PDA

Existing Speech Paradigm 1:Command-and-control Systems

• Specialized language, optimized for a given application– each application has its own interface

• Intensive training of each user• Daily use helps retain knowledge

Existing Speech Paradigm 2:Unconstrained Dialog Systems

• “Off-the-street” users, no training required• System models existing human behavior• But this comes at a cost:

– each application requires a great deal of data, labor, human expertise

– Speech Recognition technology is pushed to the limit– user does not easily grasp the application’s

functional limits• Out-Of-Vocabulary words (OOV)• Out-Of-Domain concepts, requests

Is a Third Paradigm Needed?

• In practice, people are likely to use:– a handful of apps daily:

• scheduler, contact manager, email,...

– many apps occasionally:• weather, restaurants, ...

• To exploit this, we need:– flexible, powerful interface for familiar applications.– immediate engagement with occasional or new

applications.

Our Approach

• Identify application-independent universals:– user-side– machine-side

• Find suitable, general solutions– Human and machine meeting halfway

• Design a stylized, universal “look and feel”• Teach it in 5 minutes

Universal Semantic primitives

• Help primitives– what can the machine do? how do I do X? what can I say?

• Speech channel primitives– detect & correct ASR errors; finished talking?

• Interaction primitives– turn taking; question answering; session management; undo

• Application primitives– environment variables: query, set– objects (e.g. lists): describe, navigate, create, modify, delete

USI Systems Developed

• Information Access– MovieLine– FlightLine– ApartmentLine

• Device Control– Stereo system– X-10 control (e.g., lights)– Alarm Clock applet– Digital Video Camera– Windows Media Player

USI Demonstration

• MovieLine– Experimental subject

USI Device Control

Device Interaction Analysis

• Analysis was done on multiple devices– alarm clock / radio– VCR– cell phone– MP3 player– memo pad / email / vmail– copier/fax

USI/Device Design Issues

• Confirmation strategy• Error handling strategy• Exploration• Navigation• Disambiguation / context mgmt• Orientation• Querying state variables

USI/Device Design Issues

• Confirmation strategy: restate-&-execute

• Error handling strategy: ignore

• Exploration: “OPTIONS”

• Navigation: use concept of ‘focus’

• Disambiguation / context mgmt: implicit

• Orientation: “STATUS”

• Querying state variables: “WHAT IS THE...?”

Hooking up with the PUC project

• Fits within the PUC project’s vision of automatically generated interfaces with different modalities and form factors

• But, can also be used as a standalone speech interface

• Compatibility with visual design is desirable, but not always natural:– nameless states (speech interface must have

name for everything!)– speech interface can have shortcuts (“MODE: CD”

vs. “CD”)

Meshing with the PUC project

• Device capabilities specified by XML doc• States vs. Action dichotomy of the visual

interface does not always conform to speech interface intuition.

• For now, creating our own interface specification document

• Ultimately, will augment XML DTD, so both interfaces can co-exist

USI Device control(a.k.a. James the Butler)

frequency...

station...

am

frequency...

station...

fm

(radioband)

forw ard

backw ard

seek

tuner auxiliary

play

pause

stop

(status)

#

disc

next track last track

random ... repeat...

cd

(m ode)<turns stereo on>

on

off

x-bass

volum e up

volum e dow n

volum e off

Stereo

digital camera...

James

Hardware hacking courtesy of the PUC project

USI Demonstration

• Device Control– Alarm Clock Example

User Studies

User study

• Compared Speech Graffiti (SG) & natural language MovieLines

• How does Speech Graffiti compare to a natural language interface?– Subjective user satisfaction– Task completion rates– Word error rates

• How do well do users "get" Speech Graffiti?– How often do they speak within the grammar?– In what ways do they deviate from the grammar?

Subjective user satisfaction

• 17 of 23 preferred Speech Graffiti (SG)

1 2 3 4 5 6 7

system resp. acc.

likeability

cog. demand

annoyance

habitability

speed

OVERALL

mean user satisfaction rating

NL-ML

SG-ML

• SG user satisfaction ratings higher than NL in all categories

• SG ratings positive except in annoyance & habitability

Computer experience & training

• Computer Science / Engineering backgrounds and / or programming experience – Higher user satisfaction ratings– Better task completion rates

• Training in-domain vs. out-of-domain– No differences in user satisfaction or task

completion rates

Task completion

• Overall– 67.9% SG tasks– 67.4% NL tasks

• Individual means– 5.43 of 8 SG tasks– 5.30 of 8 NL tasks

0

1

2

3

4

5

6

7

8

mean t

ask

com

ple

tion r

ate

SG-ML NL-ML

Time-to-completion

• Completed tasks– 67.9 seconds SG – 73.4 seconds NL

• Incomplete tasks:

1 2 3 4

0

200

400

600

time, in seconds

“best case” “real world”

27.3

43.5

76.0

23.0

38.0

103.8

(inc)

81.5

34.0

(inc)

103.0

28.0

59 incompletes 59 incompletes

SGML SGMLNLML NLML

Turns-to-completion

• Completed tasks– 8.2 turns SG – 3.9 turns NL

• Incomplete tasks:

1 2 3 4

5

20

3535

5

20

(inc) (inc)

4

5

9.75

1

2

510

4

5

“best case” “real world”

# of turns

SG-ML SG-MLNL-ML NL-ML

59 incompletes 59 incompletes

2

Word error rates

• Very high for both systems– On "cleaned" set (on-task, non-noisy utts)

• Concept error is lower for USI – SG: –29.2% from WER– NL: +0.8% from WER

• Low error rate is key to acceptance– 6 who preferred NL-ML had highest SG WER

WER# of utts

subj mean

subj median

SG Movie 35.1% 3626 35.0% 30.0%NL Movie 51.2% 1854 50.3% 48.9%

WER & user satisfaction

• Good correlation for SG

SG-ML

% word-error rate0 20 40 60 80

1

2

3

4

5

6

0 20 40 60 801

2

3

4

5

6

user

sati

sfa

cti

on

rati

ng

NL-ML

How often do users speak within the Speech Graffiti grammar?

• Actually, pretty often!

… and

• grammaticality leads to user satisfaction

mean 80.5%median 87.4%

1

2

3

4

5

6

7

0% 20% 40% 60% 80% 100%

% grammatical

use

r sa

tisf

act

ion r

ati

ng

How do users deviate from the grammar?

slot only14.6%

time syntax1.3%

subject-verb agreement

5.7%

more syntax4%

plural+options

2%

disfluency4.3%

keyword problem8.1%

value+options

1%

missing is/are

11%

endpoint1.6%

value only6.7%

out-of-vocabulary

concept5.1%

out-of-vocabulary word

14.0%

general syntax20.6%

Future Interface Design Work

• Redesign Help facility– SG works best for those who "get it"– Current system provides no assistance to "clueless user"

• Error analysis– Compare failure cases in SG and NL interfaces– Compare user recovery attempts in SG and NL

• Address issues of generalizability– Promoting transparency of slot set and response sets– Accessing information sets rather than single items

• Adjust grammar components

Future Architecture Work

• Integrate current USI environments– Information Access– Device Control

• Improve interface between PUC and USI components

• Identify USI-specific techniques to achieve lower WER

• Improved documentation and distribution packaging

Tech Transfer Initiative

Tech Transfer Initiative

• Tools for creating new USI apps– 3 days to create a new application– prior exposure to speech technology highly

beneficial– decided to further reduce the barrier create an application generator

From 3 Days to a Few Hours

• A USI Application Generator• New USI applications w/out programming!• XML document fully specifies the

application– slot names– accepted inputs– data types– slot properties– ...

From a Few Hours to 15 minutes?

• Created a Web interface to generating the XML document

• Form filling, pulldown menus• Strong effort to further simplify the process,

minimize complexity of form– many defaults– for less common choices, edit the XML doc.

• More importantly, no computer savvy needed

Web Application Generator

• Repository and tool for creating USI database applications

• Abundant online help to guide users through process

• Accessible to anyone with an Internet connection

Web Application Generator

• Two step process:– General specification – Slot-by-slot specification

• choose datatype from built-in list, or create own

• Fully featured system with save, copy, delete functionality

• Hides intricacies of XML document writing• Advanced users have ability to further

alter the final XML document

General Specification screen with help box displayed.

Web Application Generator

• Built-in generic voice; can record own voice• DB backend

– Postgres– Oracle– ODBC (including ASCII files)– Ultimately: web tables

• Platform:– originally: mixed Unix/Windows, telephone based– converted to: pure Windows, telephone or laptop

Transferring USI to PDG members

• We do house calls!– Carnegie Mellon will install USI developer

environment for each interested member and will train member staff in the use of the developer environment

– Provide a short tutorial on USI principles and interface design

Thank you!Pittsburgh Digital Greenhouse


Recommended