VXMLRef_007-02542-0025_R4.21_v01

CONVEDIA MEDIA SERVER™ VOICEXML INTERFACE REFERENCE GUIDE

007-02542-0025 • August 2010

RELEASE 4.21

Proprietary Information

Copyright © 2001–2010 RadiSys Corporation. All rights reserved.

RadiSys and Convedia are registered trademarks of RadiSys Corporation. CMS-3000, CMS-6000, CMS-9000, eXMP, and eXtended Media Processing are trademarks of RadiSys Corporation. Red Hat and Red Hat Linux are registered trademarks of Red Hat, Inc. Linux is a registered trademark of Linus Torvalds. All other trademarks, registered trademarks, service marks, and trade names are the property of their respective owners.

No part of this publication may be reproduced, modified, transmitted, transcribed, stored in any retrieval system, or translated into any language in any form, in whole or in part, by any means without the express prior written permission of RadiSys Corporation.

RadiSys Corporation reserves the right to make changes to software, hardware, and documentation without notice. For the most recent version of documentation, visit the RadiSys web site at:

www.radisys.com/service_support/convedia_support.cfm.

This product may include the third-party software detailed in the installation manual for your media server.

Contact Information

RadiSys Corporation 4190 Still Creek Drive, Suite 300 Vancouver, BC V5C 6C6 Canada

RadiSys Technical Assistance Center (TAC) Phone: +1-800-622-2235 (North America only, toll free) Phone: +1-604-918-6415 E-mail: [email protected]

To access support for Convedia Media Servers from the RadiSys web site, go to:


Release History

Part Number Date Description

007-02542-0025 August 2010 Version 01. Released with R4.21.0.

www.radisys.com/service _support/convedia_support.cfm


List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix

List of Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xi

List of Shadow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xiii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvIntended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi

Guide Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi

Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

RadiSys Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii

Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix

What’s New in Release 4.21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiNew Features in R4.21.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi

New Features for SIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi

Behavior Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii

Documentation Changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii

Release Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii

Chapter 1: VoiceXML Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

VoiceXML Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2VoiceXML Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Dialogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Menus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Mixed-Initiatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Subdialogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Protocol Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7VoiceXML 2.0 Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7VoiceXML 2.1 Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9SRGS Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

TABLE OF CONTENTS

Table of Contents

iv Radisys Convedia Media Server Reference Guide (v.01)

SSML Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10General XML Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

SIP Transport of VoiceXML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Request-URIs for the “dialog” Service Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Passing Variables to the VoiceXML Interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Standard Session Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Application Session Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Terminating VoiceXML Dialogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16Sample VoiceXML Call Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

VoiceXML Interaction with HTTP Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18HTTP Server-Side Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18HTTP Cookies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Set-Cookie Response Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Cookie Request Header . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

ASR and TTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

User Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22DTMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Speech . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

System Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Control Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Session Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Shadow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

ECMAScript Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Escape Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Working with Media Files and TTS Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Media Clip Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Clip Delineation in Prompts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Referring to Media Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30HTTP Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Relative URIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Sets and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Chapter 2: VoiceXML Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Properties Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Generic Speech Recognizer Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Generic DTMF Recognizer Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41Timeout vs. Interdigit Timeout Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Prompt Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Fetching Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43fetchhint Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43maxage Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44maxstale Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Other Fetch Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Table of Contents

Radisys Confidential v

Object Fetch Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Fax Detection Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Chapter 3: DTMF and Voice Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

DTMF Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Speech Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Input Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Menu-Choice Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Option Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

SRGS Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Inline SRGS Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53External SRGS Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Arbitrary Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Built-In Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Boolean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56Currency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Date . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Digits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Phone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Maximum Length of Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Grammar Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Chapter 4: VoiceXML 2.0 Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61<assign> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62<audio> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

Audio, Video, and Multimedia Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Text to Speech Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Audio Clip Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Alternate Audio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Audio Clip Name Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

<block> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68<break> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69<catch> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71<choice> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<clear> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76<controlcmd> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77<desc> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81<disconnect> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82<else> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83<elseif> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84<emphasis> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85<error> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Table of Contents

vi Radisys Convedia Media Server Reference Guide (v.01)

<example> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87<exit> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88<field> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89<filled> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92<form> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93<goto> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94<grammar> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96<help> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100<if> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<initial> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102<item> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103<link> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<log> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108<mark> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109<menu> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110<meta> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112<metadata> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113<noinput> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114<nomatch> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115<one-of> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116<option> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117<p> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119<param> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120<phoneme> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121<prompt> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

Prompt Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128Barging and Prompts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

<promptcontrol> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130<property> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131<prosody> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132<record> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

Storage of Recorded Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136Size of Streamed Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137Encoding of Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137Stopping Recordings with DTMF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138Setting a Pre-Speech Timer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138Trimming Post-Speech Silence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139Appending to a Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

<reprompt> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141<return> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142<rule> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144<ruleref> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145<s> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146<say-as> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147<script> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148<speak> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Table of Contents

Radisys Confidential vii

<sub> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151<subdialog> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152<submit> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155<throw> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157<value> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158<var> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159<voice> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160<vxml> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

Chapter 5: VoiceXML 2.1 Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169<data> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170<foreach> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

Chapter 6: ECMAScript Language Binding for the DOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177Attr Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178CDATASection Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179CharacterData Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180Comment Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182Document Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183DOMException Prototype Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184Element Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185EntityReference Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187NamedNodeMap Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189Node Prototype Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190NodeList Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192ProcessingInstruction Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193Text Object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

Appendix A: Best Practices for VoiceXML Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

Glossary of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203

Table of Contents

viii Radisys Convedia Media Server Reference Guide (v.01)

Table 1-1 VoiceXML 2.0 Supported Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Table 1-2 VoiceXML 2.1 Supported Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Table 1-3 SRGS Supported Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Table 1-4 SSML Supported Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Table 1-5 Event Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Table 1-6 Error Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Table 1-7 VoiceXML: Supported Media Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Table 1-8 VoiceXML: Supported Media Clips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Table 1-9 Referencing Named Media Files in VoiceXML. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Table 1-10 Referencing Indexed Audio Files in VoiceXML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Table 2-1 Property Support Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Table 2-2 MRCP Speech Recognizer Properties Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

Table 2-3 General Speech Property Elements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Table 2-4 Generic DTMF Recognizer Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Table 2-5 Prompt Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Table 2-6 fetchhint Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Table 2-7 maxage Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Table 2-8 maxstale Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Table 2-9 Support for Other Fetch Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Table 2-10 Support for Object Fetch Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Table 2-11 Fax Detection Property Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Table 2-12 Interaction of “bargein” and Fax Tone Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Table 2-13 Interaction of “dtmfterm” and Fax Tone Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Table 3-1 Default Input Modes for VoiceXML Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Table 3-2 Mechanisms for Setting Input Mode Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Table 3-3 Interaction of Input Mode and Grammar Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Table 3-4 Conversion of Built-In Speech Grammars to XML-SRGS Grammars . . . . . . . . . . . . . . . . . . 56

LIST OF TABLES

x Radisys Convedia Media Server Reference Guide (v.01)

List of Tables

Table 4-1 Conversion of <field> “type” Attribute to <grammar> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

Table 4-2 Prompt Completion Shadow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Table 4-3 DTMF Collection Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

Table 4-4 Effect of Barging Announcements on the Digit Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Table 4-5 Recording Shadow Variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

Table 4-6 Supported Encoding Formats for Recordings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Table 4-7 Summary of “append” Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

Example 1-1 Request-URI for “dialog” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Example 1-2 Request-URI for “dialog” with Query . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Example 1-3 SIP INVITE with Query in URI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Example 1-4 SIP Dialog with Query in URI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Example 1-5 Example Server-Side Perl Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Example 1-6 Relative URI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Example 1-7 Absolute URI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Example 3-1 Inline SRGS DTMF Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Example 3-2 Inline SRGS Voice Grammar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Example 3-3 Boolean Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Example 3-4 Currency Built-In DTMF Grammar. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Example 3-5 Date Built-In DTMF Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Example 3-6 Digits Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Example 3-7 Number Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Example 3-8 Phone Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Example 3-9 Time Built-In Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Example 3-10 Variable Maximum Digit Length in a DTMF Grammar. . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Example 4-1 Alternate Audio Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Example 4-2 <option> Grammar Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Example 4-3 XML-SRGS Grammar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

LIST OF EXAMPLES

xii Radisys Convedia Media Server Reference Guide (v.01)

List of Examples

application.cvd_lastprompt$.bargein ............................................................................................................................ 125application.cvd_lastprompt$.duration........................................................................................................................... 125application.cvd_lastprompt$.lasturl ............................................................................................................................... 125application.cvd_lastprompt$.lasturl_offset .................................................................................................................. 126application.cvd_lastresult$.faxtyp................................................................................................................................... 127application.cvd_lastresult$.termcond............................................................................................................................ 127application.lastresult$.confidence................................................................................................................................... 127application.lastresult$.inputmode .................................................................................................................................. 126application.lastresult$.interpretation.............................................................................................................................. 126application.lastresult$.utterance...................................................................................................................................... 126name$.duration .................................................................................................................................................................. 136name$.maxtime.................................................................................................................................................................. 136name$.size........................................................................................................................................................................... 136name$.termchar ................................................................................................................................................................. 136

LIST OF SHADOW VARIABLES

xivRadisys Convedia Media ServerReference Guide (v.01)

List of Examples

This guide describes the Voice Extensible Markup Language (VoiceXML) interface to the Convedia Media Server. It provides a brief overview of VoiceXML, highlighting core concepts. It also documents Convedia Media Server compliance with the VoiceXML specification [13] and [14] describing extensions, deviations, and/or omissions from the specification.

The VoiceXML 2.0 language is defined by the W3C Recommendation specifying the language [13]. VoiceXML 2.1 is defined by [14].For a full description of VoiceXML, the reader is referred to that Recommendation, which remains the normative implementation reference. Any features of VoiceXML specified in the Recommendation but not in this guide are not supported by the Convedia Media Server in this release. Any features of VoiceXML specified in this guide but not in the Recommendation are extensions to the specification.

This preface describes this guide, laying out its organization, the assumptions made about the reader, and the conventions used in the guide. It also explains how to get technical support, and describes the features that are new in this release.

The following information is presented:

• Intended Audience

• Guide Organization

• Document Conventions

• RadiSys Publications

• Technical Support

• What’s New in Release 4.21

PREFACE

Preface

xvi Radisys Convedia Media Server Reference Guide (v.01)

Intended Audience

This guide is intended for applications developers and other technical personnel wanting to communicate with a Convedia Media Server from a control agent (that is, from a softswitch or an application server) using SIP and VoiceXML.

Readers should be thoroughly conversant with application programming using Session Initiation Protocol (SIP).

Guide Organization

This guide is organized as follows:

Chapter 1: VoiceXML Overview This chapter provides an overview of the core concepts of the Voice Extensible Markup Language (VoiceXML).

Chapter 2: VoiceXML Properties This chapter describes the media server’s support for VoiceXML properties.

Chapter 3: DTMF and Voice Grammars This chapter describes the media server’s support for DTMF and voice grammars in VoiceXML.

Chapter 4: VoiceXML 2.0 Elements This chapter describes the VoiceXML 2.0 elements currently supported by the Convedia Media Server, including SRGS and SSML elements.

Chapter 5: VoiceXML 2.1 Elements This chapter describes the VoiceXML 2.1 elements currently supported by the Convedia Media Server.

Chapter 6: ECMAScript Language Binding for the DOM

This chapter describes the ECMAScript binding for the subset of Level 2 of the DOM.

Appendix A: Best Practices for VoiceXML Development

This appendix describes some development practices that can help you maximize performance and capacity of your VoiceXML applications.

References

Glossary of Acronyms

Document Conventions

Radisys Confidential xvii

Document Conventions

This guide uses the following advisory paragraphs:

Note: Notes provide information you might need to avoid problems or configuration errors.

In addition to advisory paragraphs, the following typographic conventions are used in RadiSys guides:

Warning: Warnings alert you to situations that may pose a threat to personal safety.

Caution: Cautions alert you to situations that might cause harm to your system or damage to equipment, or that may affect service.

Monospace Monospace font is used in special example paragraphs to indicate code samples and console output.

<Monospace> Angle brackets surrounding Monospace font are used to indicate elements in a markup language, such as VoiceXML and MSML.

boldface Monospace Boldface Monospace font is used in examples where you must interact with the system. The text in boldface Monospace represents information you must enter.

boldface Boldface font is used to indicate file names, comnmands, and any term in a formal language—for example, a signal or parameter in MGCP, an attribute in MSML, a property in VoiceXML, and other methods, classes, and headers.

italics Italic font is used in command or element syntax, and inline, to indicate arguments and variables, that is, values that you must supply.

CAPS Upper case is used to indicate protocol requests and messages, for example, a PUT request in HTTP, a SYN packet in TCP, or an INVITE or BYE message in SIP.

<key> Angle brackets are used to indicate a key on your keyboard. Combinations of keys are joined by plus signs (“+”), for example <Ctrl>+<Alt>+<Del>.

[ ] Square brackets enclose elements that are optional in a syntax.

{} Curly brackets enclose a set of syntax elements where exactly one element must be chosen.

Preface

xviii Radisys Convedia Media Server Reference Guide (v.01)

RadiSys Publications

The following product documentation is available for RadiSys products. Download the correct version of the documents you need from the RadiSys web site at www.radisys.com..

arg | arg Vertical bars are used to separate elements that are strict alternatives (“exclusive OR”). When vertical bars are used, only one alternative can be chosen.

arg [arg...] The typographic convention at left indicates a value that can optionally represent a space-separated list of the same kind of element (for example, a space-separated list of IP addresses).

arg[, arg...] The typographic convention at left indicates a value that can optionally represent a comma-separated list of the same kind of element (for example, a comma-separated list of IP addresses).

arg[-arg...] The typographic convention at left indicates a value that can optionally represent a hyphen-separated range of values (for example, a range of IP addresses).

Convedia Media Server System Description Provides a high-level overview of RadiSys Convedia Media Servers.

IMMS 3G-324M-Integrated Media Server Solutions Guide

Provides an overview of the IMMS 3G-324M-Integrated Media Server and its place in the network.

CMS-9000 Media Server User Guide Describes the CMS-9000 Media Server, and explains how to perform operations, administration, management on the CMS-9000 Media Server using the web GUI.

CMS-9000 Media Server Hardware Installation Manual

Provides hardware installation and maintenance procedures for the CMS-9000, up to and including RS-232 console configuration.





Technical Support

Radisys Confidential xix

Technical Support

Technical support is available from the RadiSys Technical Assistance Center (TAC). Support is governed by the terms of your agreement with RadiSys Corporation.



Convedia Software Media Server User Guide Describes the Convedia Software Media Server, and explains how to perform operations, administration, management on the Convedia Software Media Server using the web GUI.

Convedia Software Media Server User Guide (Co-Resident Mode)

Describes the Convedia Software Media Server, and explains how to perform operations, administration, management using the web GUI on the Convedia Software Media Server when the operational mode is configured to co-resident mode.

Convedia Software Media Server Installation Manual

Provides software installation and maintenance procedures for the Convedia Software Media Server, up to and including initial network configuration.

Convedia Media Server SIP Interface Reference Guide

Describes the media server’s support for SIP, and how to use the SIP interface.

Convedia Media Server VoiceXML Interface Reference Guide

Describes the media server’s support for VoiceXML 2.0 and 2.1, and how to use the VoiceXML interface.

Convedia Media Server MSML 1.1 Interface Reference Guide

Describes the media server’s support for MSML 1.1, and how to use the MSML interface.

Convedia Media Server MGCP Interface Reference Guide

Describes the media server’s support for MGCP, and how to use the MGCP interface.

Convedia Media Server H.248 Interface Reference Guide

Describes the media server’s support for H.248/MEGACO, and how to use the H.248 interface.

Convedia Media Server SNMP Interface Reference Guide

Describes the media server’s support for SNMP, and how to use the SNMP interface.

Convedia Media Server Sets and Variables Interface Reference Guide

Describes the media server’s sets and variables feature, and support for each language.

Convedia Media Server Special Interfaces Reference Guide

Explains how to configure and use the media server to interoperate with external devices such as NFS servers, HTTP servers, speech servers, and video terminals.

Convedia Media Server Capacity and Performance Reference Guide

Provides general guidelines for expected performance and capacity for RadiSys Convedia media servers.

Preface

xx Radisys Convedia Media Server Reference Guide (v.01)

TAC can be reached using the following contact information:

RadiSys Corporation 4190 Still Creek Drive, Suite 300 Vancouver, BC V5C 6C6 Canada

RadiSys Technical Assistance Center (TAC) Phone: +1-800-622-2235 (North America only, toll free) Phone: +1-604-918-6415 E-mail: [email protected]

To access support for Convedia Media Servers from the RadiSys web site, go to:



What’s New in Release 4.21

Radisys Confidential xxi


This section focuses on differences between this release of the product and the previous release. As each new point release is issued, starting with R4.21.0, this section will record the changes so that the history can be reviewed from R4.21.0 to R4.21.n.

This section is included as a quick reference for users who are upgrading to this release from the previous release. Customers new to the product are advised to read the entire document.

New Features in R4.21.0• R4.21.0, the first field release of R4.21, is supported on the CMS-9000 and CMS-3000.

New Features for SIP• Media server integrates 3G-324M multimedia gateway

To provide real-time multimedia services to mobile phones over circuit switched networks, the 3rd Generation Partnership Project (3GPP) adopted the 3G-324M protocol. It comprises several ITU-T standards: H.223 Multiplexing Protocol for Low Bit Rate Multimedia Communication, H.245 Control Protocol for Multimedia Communication, and H.324 Terminal for Low Bit-Rate Multimedia Communication.

This release integrates 3G-324M gateway functionality into the media server. Previously, to interface with a 3G-324M network the media server required a 3G-324M video gateway. The RadiSys 3G-324M–Integrated Media Server connects to the 3G-324M network through voice gateways that support Clearmode (a psuedo-codec defined in RFC 4040 for transparent transportation of 64 kbit/s channel data in RTP packets).

3G-324M calls are initiated by a call agent specifying Clearmode in a SIP INVITE. Clearmode call scenarios are supported for both call-agent offers and media-server offers.

The resultant Clearmode port multiplexes and demultiplexes the 3G-324M RTP, receiving requests as H.245 control messages and streaming audio and video. (Data components in received 3G-324M streams are not processed.) The media server supports H.223 transport Levels 0 - 2 (baseline H.223, Annex A, and Annex B) with Adaptation Layer AL1 for H.245 control and AL2 for media.

To establish 3G-324M calls, the media server negotiates H.245 master/slave determination, terminal capabilities, opening audio and video channels, and H.223 multiplexing. For reliable communications the media server supports Numbered Simple Retransmission Protocol (NSRP) and Windowed NSRP (WNSRP), and uses Control Channel Segmentation and Reassembly Layer (CCSRL). Request Mode requests are not processed; Round-trip Delay (RtD) requests are accepted and an RtD response returned.

The media server’s support for 3G-324M includes Annex A, Annex C (level 0,1, and 2), and Annex K (MONA) Class II. Sessions can be audio-only, video-only, or multimedia.

Media server MSML and VoiceXML features are supported for 3G-324M sessions with the following exceptions: DTMF inband detection, generation (inband and out-of-band), long digits, and pass through; fax detection and functionality; video text overlay and continuous presence video conferencing; and MPC redundancy for ongoing sessions. The AMR narrow-band audio codec and H.263 video codec are supported for these sessions.

Preface

xxii Radisys Convedia Media Server Reference Guide (v.01)

When enabled, the media server’s MSML interface reports to the control agent as events significant state changes in the 3G-324M session, such as the establishment of logical channels.

For an overview of the RadiSys 3G-324M–Integrated Media Server, please see the Integrated Mobile Media Server (IMMS) 3G-324M-Integrated Media Server Solutions Guide. Complete details of the media server’s support for the 3G-324M protocol are given in the Convedia Media Server Special Interfaces Reference Guide. The User Guide for your media server describes how to configure the integrated 3G-324M gateway. Additional usage information is provided in the protocol guides.

• 3G-324M session statistics This release introduces new statistics for 3G-324M sessions. For every statistics interval the media server reports the number of sessions created, maximum concurrent sessions, successful and failed sessions set up.

The media server’s existing per-port statistics are supported for 3G-324M sessions (with the exception of those related to the jitter buffer).

For more information about new statistics, please see the Convedia Media Server SNMP Interface Reference Guide and the User Guide for your platform.

Behavior ChangesThere are no behavior changes in this release.

Documentation Changes• New Integrated Mobile Media Server (IMMS) Solutions Guide

This release introduces a new book, the Integrated Mobile Media Server (IMMS) 3G-324M–Integrated Media Server Solutions Guide, which provides an overview of the first IMMS product, a media server with integrated 3G-324M video gateway functionality.

• New 3G-324M Gateway chapter in Convedia Media Server Special Interfaces Reference Guide This release adds a new chapter, 3G-324M Gateway, to describe the media server’s support for 3G-324M sessions.

• Changes to the Convedia Media Server MSML 1.1 Interface Reference Guide The Convedia Media Server MSML 1.1 Interface Reference Guide has been restructured to better reflect the organization in the MSML specification, RFC 5707.

Release LimitationsThis release does NOT support the following media server features, available in the previous release (R4.20) of the CMS-9000 and CMS-3000 media servers:

• RFC 4117 transcoding Audio transcoding services as an RFC 4117 Transcoding Server (T), providing transcoding services between two SIP User Agents (UAs) through the use of Third Party Call Control (3pcc).

• New hardware: TPC-I A new Transcoding Processor Card (TPC-I) dedicated to providing RFC 4117 audio transcoding services on the CMS-9000.


Radisys Confidential xxiii

• EVRC codec 3G2 C.S0014-0 Enhanced Variable Rate Codec (EVRC-A) codec for EVRC0 media type specified in RFC 3558.

• Automatic noise reduction Automatically activating and inactivating noise reduction based on a configured threshold.

• MSML support for CRBT random ring MSML <play> element’s start and end attributes, used to select part of an announcement.

• NLD reports change to noise type Events for changes in the type of noise (background, impulsive, continuous-signal noise, or a low SNR) exceeding configured limits.

• MSML configuration of per-port statistics Configuring the per-port statistics through the MSML interface. Per-port statistics can be configured through the SIP interface.

• T.38 fax data is replicated on G.711 ports Replicating T.38 fax data when the call is negotiated as G.711 in the SIP “group” context.

• Enhancements to SIP Custom Profile 2 for facsimile services Enhancements to SDP for fax support and changes to case sensitivity.

• 3G2 file format Multimedia, audio-only, and video-only announcements in the 3G2 file format as defined in 3GPP2 C.S0050.

• SIP message serialization SIP message serialization prevents out-of-order delivery of SIP messages.

R4.20 also introduced a number of new VQE (voice quality enhancement) statistics and improvements to the echo cancellation algorithms that are not implemented in this release.

Additionally, the following CMS-9000 behavior changes of R4.20 are not implemented in this release:

• Default network topology change from Internal Control Subnet to External Control Subnet.

• Binding the Apache HTTP daemon service to the management interface on the SCC.

For detailed descriptions of these features, please see the documentation for R4.20.

Preface

xxiv Radisys Convedia Media Server Reference Guide (v.01)

RadiSys Confidential 1

1

This chapter provides an overview of the core concepts of the Voice Extensible Markup Language (VoiceXML).

This chapter presents the following information:

• Introduction

• VoiceXML Structure

• Protocol Support

• SIP Transport of VoiceXML

• VoiceXML Interaction with HTTP Servers

• ASR and TTS

• User Input

• System Output

• Control Flow

• Session Termination

• Shadow Variables

• Events

• Errors

• ECMAScript Support

• Escape Characters

• Working with Media Files and TTS Strings

Chapter 1: VOICEXML OVERVIEW

2 RadiSys Confidential

VoiceXML Overview1

Introduction

VoiceXML is an XML-based markup language for creating user dialogs or Interactive Voice Response (IVR) interactions.

VoiceXML provides an extensive mechanism for developing simple or complex IVR applications. The ability to create modular applications from many reusable subcomponents enables VoiceXML developers to create complex IVR applications in a short period of time. The widespread adoption of VoiceXML, together with its inherent similarities to data-centric user dialogs, make it a powerful language for IVR application development.

The media server supports a rich set of VoiceXML mechanisms for creating simple or elaborate IVR applications:

• Playing of streamed audio files, stored inside the media server or on external NFS and HTTP servers

• Inband and RFC 2833 DTMF detection, collection, and interpretation

• Detection of user speech input

• Support for built-in, SRGS, Menu-Choice, and Option grammars for both DTMF and speech

• Support for playing Text to Speech media clips

• Recording of audio and video to internal memory or external NFS and HTTP servers

• Playback of user-recorded audio and video to internal memory or external NFS and HTTP servers

• Support for VCR-like controls (skip forward, skip back, pause, resume, append)

• CNG and CED fax detection and notification capabilities

• Embedding of complex functions (ECMAScript/JavaScript)

• Dialog control flow

• The ability to transfer the caller to another destination, such as another telephone line or voice application

The basis of all VoiceXML dialogs consists of sending audio prompts to the user and collecting user input in the form of DTMF digits. An example application is a user dialing up a service center and ibeing prompted to select from several spoken options by pressing the corresponding telephone key. Upon receiving the DTMF information the VoiceXML application determines what action to take.

VoiceXML Structure

This section presents the following topics:

• VoiceXML Documents

• Dialogs

• Forms

• Mixed-Initiatives


1VoiceXML Structure

• Menus

• Elements

• Subdialogs

• Scope

VoiceXML DocumentsA VoiceXML application consists of one or more VoiceXML documents, or scripts. The IVR session with a user begins at the invocation of the first VoiceXML document associated with the application. This document is called the root document of the application. During the IVR session any number of additional documents (leaf documents) may be fetched and loaded, and then unloaded, until the user ends the IVR session according the application dialog flow.

During the IVR session the root document may reference, or call, other “supporting” VoiceXML documents, as in the illustration below. During a given session, any number of documents may be loaded and unloaded. While a subdocument is loaded, information from higher-level documents remains available to the session.

Although applications alwaysbegin by loading the root document, they can terminate from any document, including subdocuments. This ends the user’s IVR session. Alternatively, an external control agent (for example, a SIP User Agent) can forcibly terminate the IVR session at any point.

DialogsThe foundation of the VoiceXML application is the dialog, which takes place between the application and the user. VoiceXML dialogs define interactions between a user and the network through an IVR session. Once the application has launched, the user interacts with it through VoiceXML dialogs and subdialogs.

Dialogs are composed of VoiceXML elements. The dialog is a series of audio or video prompts to the user, streamed over RTP, and subsequent collection of user input in the form of DTMF key presses or speech inputs, which are detected and reported to the VoiceXML session. The control logic defined in the VoiceXML application (that is, the document or script) defines when media is played to the user and when user input is collected, to create a dynamic media-based user dialog or IVR session, similar in nature to a web or HTML-based data-centric user dialog session.

The following types of dialogs can be created using VoiceXML:

Root

Document D2

Document D1

Document


VoiceXML Overview1

• System-directed. In system-directed dialogs, the system leads the user by asking questions and waiting for user input.

• User-directed. In user-directed dialogs, the user input controls the dialof flow.

• Mixed-initiative. In mixed-initiative dialogs, either the system or the user can direct dialog flow. These types of dialogs are more complex and are difficult to implement in DTMF-based systems. With speech-based grammars, this type of dialog is more practical to implement.

DTMF input is obtained from users through either forms or menus.

FormsThe form item is the primary mechanism of prompting the user. A form-based dialog plays an audio or multimedia prompt to the user. In response, the user presses some sequence of DTMF digits or responds with speech input, in which the input is expected to match the field format or grammar. If the collected choices match the expected grammar, they are said to fill the field.

Collected input matching the expected grammar is assigned to the field variable. The field variable can then be used as a standard variable (as in a standard programming language) within further logic and control flow. Additionally, the collected input contained in the variable can be submitted to an external application using the HTTP protocol.

Each form item may consist of two sections: a user input item or a form control item. The media server supports any of the following elements as user input items:

• <field>: This element allows the user to enter DTMF according to a pre-determined format or grammar.

• <record>: This element records audio spoken by the user.

• <subdialog>: This element moves the user to another location (a subdialog) in the application. When the subdialog is complete, control returns to the calling dialog.

The media server supports any of the following elements as form control items:

• <block>: This element does not collect input, but rather defines a set of executable statements for prompting.

• <initial>: Defines the initial control for the form when using mixed-initiative dialogs (where either the system or the user can direct the dialog flow).

MenusA menu-based dialog presents the user with a number of choices. The menu item is a simplified version of a form item, designed to present the user with a fixed set of choices.

The choices are presented as a series of audio or multimedia prompts played to the user. In response, the user presses some sequence of DTMF digits or speaks, and the inputs are collected and interpreted by the application. For example, a simple menu item may ask the user to press DTMF digit 1 to hear a weather report, to press DTMF digit 2 to hear a sports report, and to press DTMF digit 3 to hear a traffic report. If the user input matches one of the choices, then the application transitions control to another location within the document, or to another document in the application, as specified for the given choice.


1VoiceXML Structure

The basic structure of a menu is as in the following example:

MenuPlay prompt to user requesting choice (1, 2 or 3) from userWait for user inputUser Choice 1: action 1 (jump to location 1)User Choice 2: action 2 (jump to location 2)User Choice 3: action 3 (jump to location 3)No User Input: No-input action

End Menu

Mixed-InitiativesA mixed initiative is a <form> element containing one or more <form>-level grammars, where both the user and the application can define the direction the dialog will take. A common mechanism for implementing this is to use an <initial> element that prompts the user for general information. The results of the user’s input then directs the user to specific fields with specific prompts and possibly other grammars defined. This mechanism is most commonly used in voice-based applications.

ElementsA VoiceXML element invokes an action. For example, the <prompt> element defines the output to be played to a user.

The scope of an element is from its opening tag to its closing tag, as in the following example:

<prompt>....</prompt>

An element can have child elements nested within its scope or can itself be a child nested within the scope of a parent element. Elements can have attributes associated with them with values that can be set.

The media server’s support for elements is summarized in the section “Protocol Support” on page 7. Each supported element is described in detail in “Chapter 4: VoiceXML 2.0 Elements” and “Chapter 5: VoiceXML 2.1 Elements”

SubdialogsA subdialog allows a user to enter into another dialog. Upon returning from the subdialog, the original dialog continues from the place where it left. Parameters can be passed into the subdialog and the subdialog can return values to the calling control logic. The subdialog mechanism is much like a subroutine in a standard programming language.


VoiceXML Overview1

Subdialogs are useful in creating and organizing commonly used dialog functions as a libraries, which can be reused by many applications.

ScopeWhenever a supporting document is loaded by the VoiceXML interpreter, the root document is also loaded. This provides the interpreter with all the global information necessary to properly apply values to variables, links, and events. However, a value may be redefined within a different scope.

The concept of scope applies to grammars, variables, links, and event handling. Scope determines the order of precedence for VoiceXML tags. Scope allows developers to:

• Control the global behavior of an application

• Group logically related tasks into documents

• Break down large applications into more manageable, faster-loading modules

VoiceXML has a number of scopes, listed here in order of decreasing scope and increasing precedence.

• Session. Session variables are declared by the platform on which the voice application is deployed. Session variables apply to an entire user session. They are read-only, which means they cannot be modified within any VoiceXML document, either the root document or a supporting document.

• Application. Applications are declared within the <vxml> tag of the root document. Values assigned at the application level are initialized when the root document is loaded, and apply as long as it remains loaded. These values are available to any element within the root document or any supporting document referenced by the root document.

• Document. Values within documents are assigned within the <vxml> tag of a supporting document. Document values are initialized when the supporting document is loaded, and remain available as long as the document is loaded. Document values are available to any dialog within the document. Document values are not available across documents.

• Dialog. Values for dialogs are declared within the <form> or <menu> tags in a document. Values for dialogs are available only to the elements within the dialog for which they are declared.

• For executable content, values are initialized when the content is executed and are released when execution terminates.

• For form/field items, values are initialized when the form item is collected.

• Elements. Values for elements apply to any of its child elements.

Precedence of values increases as the scope becomes more local. That is, the session scope has the least precedence, and values within a dialog have the greatest precedence. Another way to say this is that global scoping behavior can be overridden by declaring parameters at a lower level; locally defined values always override values defined at a higher level.

For example, the scope of variables from broadest to narrowest is as follows:Session > Application > Document > Dialog > Anonymous

On the other hand the precedence of variables from highest to lowest is as follows:


1Protocol Support

Anonymous > Dialog > Document > Applicion > session

Protocol Support

This section describes the media server’s support for the following protocols:

• VoiceXML 2.0 Elements

• VoiceXML 2.1 Elements

• SRGS Elements

• SSML Elements

• General XML Handling

Use of an unsupported VoiceXML element results in an error.unsupported event. Use of an unsupported SRGS element results in an error.badfetch or an error.grammar, depending on when it is encountered. An unsupported SSML elements (and its content) is ignored in order to maximize compatibility with documents that include SSML elements as alternatives to prerecorded audio files. SSML elements are not supported within SRGS grammars.

VoiceXML 2.0 ElementsThis release of the VoiceXML interface is based on the 2.0 specification of VoiceXML, as given in [13]. Table 1-1 shows which elements from that Recommendation are supported.

Table 1-1 VoiceXML 2.0 Supported Elements

Element Description Supported

<assign> Assigns a value to a variable. Yes

<audio> Plays an audio clip or multimedia file or renders a text-to-speech clip.

Yes

<block> Allows execution of code within a form. Yes

<catch> Handles (catches) events. Yes

<choice> Provides menu choices. Yes

<clear> Clears or resets form items (form fields). Yes

<controlcmd> RadiSys extension. Specifies the actions associated with DTMF key presses for prompt controls.

Yes

<disconnect> Terminates the VoiceXML application, sending a SIP BYE.

Yes

<else> Provides alternative logic for an <if> condition. Yes

<elseif> Provides alternative logic for an <if> condition. Yes


VoiceXML Overview1

<enumerate> Not supported. Rejected

<error> Handles (catches) all error events. Yes

<exit> Terminates the VoiceXML application, while keeping the port open.

Yes

<field> Collects user input. Yes

<filled> Defines the code to be executed when user input is complete.

Yes

<form> Defines a dialog for collecting user input. Yes

<goto> Transfers control to another dialog, abandoning the current dialog.

Yes

<grammar> Defines user input rules for DTMF or voice. Yes

<help> Handles (catches) help events. Yes

<if> Defines conditional logic. Yes

<initial> Provides the initial prompt in a form. Yes

<link> Specifies a destination URL when a grammar activates a match.

Yes

<log> Generates messages for logging and troubleshooting. Yes

<menu> Provides a fixed set of menu selections. Yes

<meta> Defines page information. Yes

<metadata> Defines information about a document using a metadata schema.

Ignored

<noinput> Handles (catches) a user input timeout event. Yes

<nomatch> Handles (catches) an invalid user input event. Yes

<object> Not supported. Rejected

<option> Provides a simple method for specifying grammars. Yes

<param> Defines a parameter to a subdialog. Yes

<prompt> Specifies media output to be played to a user. Yes

<promptcontrol> RadiSys extension. Specifies media controls for user prompt manipulation.

Yes

<property> Sets the value of a property. Yes

<record> Records user audio, video, or multimedia to a file. Yes

<reprompt> Repeats a prompt for user input. Yes

<return> Return from a subdialog to the calling dialog. Yes

<script> Executes ECMAScript (JavaScript) code. Yes




1Protocol Support

VoiceXML 2.1 ElementsThis release of the VoiceXML interface is based on the 2.1 specification of VoiceXML, as given in [14]. Table 1-2 shows which elements from that Recommendation are supported.

SRGS ElementsThe SRGS specification is given in [11]. Table 1-3 shows which elements from that Recommendation are supported.

• Note that, while the media server supports all SRGS elements for voice grammars, the actual support for voice is a function of the specific support provided by the external speech server deployed. Whether the external server support all the elements supported by the media server depends on the server deployed.

<subdialog> Invokes another dialog, from which control will eventually return.

Yes

<submit> Submit application values and fetch a new document, transitioning to a new dialog.

Yes

<throw> Generates an event to be handled by <catch>. Yes

<transfer> Not supported. Rejected

<value> Declares a variable and assigns it a value. Yes

<var> Inserts the value of an expression into a log message or prompt.

Yes

<vxml> The root element for VoiceXML. Defines the set of actions that form a VoiceXML dialog.

Yes



<data> Fetches XML data from a document server without transitioning to a new VoiceXML document.

Yes

<foreach> Allows a VoiceXML application to iterate through an ECMAScript array, executing the content of each array item..

Yes. The media server does not support and rejects the following child elements of <foreach> in this release: <aws> and <enumerate>. The media server ignores the following child elements of <foreach> in this release: <emphasis>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <value>, and <voice>.




VoiceXML Overview1

Note also that even if supported or ignored, if used illegally, an element will be rejected with an error. For example, an SRGS element that would be ignored if used correctly will be rejected with an error if enclosed directly within a VoiceXML element.

SSML ElementsThe SSML specification is given in [5]. Table 1-4 shows supported elements from that Working Draft plus supported VoiceXML extensions as per [13].

Please note that, for SSML elements, “supported” means that the media server passes the request to the external speech server. The behavior for the element depends on the behavior of the speech server and this can vary. That is, from the point of view of the media server, all SSML elements except <speak> may be included in a VoiceXML document; whether the external server supports them is an independent matter.

Table 1-3 SRGS Supported Elements


<example> [SRGS] Provides an example phrase that matches the input specification.

DTMF: IgnoredVoice: Yes

<grammar> Defines user input rules for DTMF or voice. Yes

<item> [SRGS] Defines valid user input, as part of a DTMF or voice grammar rule.

Yes

<lexicon> [SRGS] Defines valid user input, as part of a DTMF or voice grammar rule.


<meta> Defines page information. DTMF: IgnoredVoice: Yes

<metadata> [SRGS] Defines information about a document using a metadata schema.


<one-of> [SRGS] Allows one selection from a list of alternatives.

Yes

<rule> [SRGS] Defines a grammar rule for an inline DTMF or voice grammar.

Yes

<ruleref> [SRGS] Allows another voice grammar rule to be included.

DTMF: RejectedVoice: Yes

<token> DTMF: RejectedVoice: Yes


1Protocol Support

Note also that even if an element is supported or ignored, if used illegally, it is rejected with an error. For example, an SSML element that would be ignored if used correctly will be rejected with an error if used illegally within an SRGS grammar.

Table 1-4 SSML Supported Elements


<break> Inserts a pause or silence into audio. Yes

<desc> [SSML] Provides a textual description of audio content. Yes

<emphasis> [SSML] Directs the speech server to add emphasis to surrounded text.

Yes

<enumerate> [VoiceXML extension] This element is defined in [13]. Its behavior is determined by the external speech server, and is not described in this guide.

Yes

<lexicon> [SSML] This element is defined in [5]. Its behavior is determined by the external speech server, and is not described in this guide.

Yes

<mark> [SSML] Places a marker into a text or tag sequence. Yes

<meta> [SSML] This element is defined in [5]. Its behavior is determined by the external speech server, and is not described in this guide.

Yes

<metadata> [SSML] This element is defined in [5]. Its behavior is determined by the external speech server, and is not described in this guide.

Yes

<p> [SSML] Represents a paragraph. Yes

<phoneme> [SSML] Provides a phonemic/phonetic pronunciation for the contained text.

Yes

<prosody> [SSML ] Permits control of the pitch, speaking rate and volume of the speech output

Yes

<s> [SSML] Represents a sentence. Yes

<say-as> [SSML] Defines a text string to be rendered as an audio clip. Yes

<speak> [SSML] The root element of SSML. Yesa

a. The <speak> element is not supported directly in VoiceXML scripts. All TTS scripts are rendered into <speak> SSML XML scripts which are then passed to an external server for playing if an external server is active. A parse error results if a <speak> element with TTS text is included in a VoiceXML file.

<sub> [SSML] Replaces the contained text with a substitute. Yes

<value> [VoiceXML extension] Yes

<voice> [SSML] Requests a change in speaking voice. Yes


VoiceXML Overview1

General XML HandlingThe following features for general XML handling are supported:

• The media server ignores XML namespace declarations if present.

• The media server replaces XML predefined entity references—for example, <, &, >, ", and '—in received data are replaced with the appropriate literal character.

• The media server replaces XML special characters—for example, the less-than operator (“<“), the ampersand (“&”), the greater-than operator (“>”), quotation marks (“), and the apostrophe (“ ‘ ”)—with predefined entity references in sent data.

• XML identifiers—including element names, attribute names, and literals such as attribute values—are case-sensitive.

The following media server behavior is an exception to standard XML handling:

• The media server does not support using an XML numeric character reference (for example &#nnnn; and &xhhhh;) to refer to a character by its Unicode code point, where nnnn is the code point in decimal form and hhhh is the code point in hexadecimal form. The media server does not replace XML numeric character references with the appropriate literal characters; instead, the XML parser simply passes these characters as-is to the interpreter.

SIP Transport of VoiceXML

The SIP dialog service context allows a VoiceXML document to be accessed by the media server. A VoiceXML dialog is initiated whenever a SIP INVITE is received for a dialog service context. The behavior for initiating VoiceXML dialogs depends on the setting configured for the SIP Standards Profile through the media server’s management interface.

When the SIP Standards Profile has been configured to be Default (using the media server’s management interface), the behavior is as follows:

• If the media server does not have the resources to initiate the VoiceXML interpreter, it will return a 486 (Busy here) response. Otherwise, the media server will continue with SIP signalling according to the SDP Offer/Answer model, sending a 200 (OK) response to the INVITE and waiting for the control agent’s ACK.

• Once media negotiation has completed and the media server is able to send and receive media, it will retrieve the VoiceXML document from the server and begin document execution. If the MS is unable to retrieve the document, it will issue a BYE and include a Reason header identifying the problem with a 404 (Not found) response.

When the SIP Standards Profile has been configured to be Profile 1 (using the media server’s management interface), the behavior is as follows:

• If the media server does not have the resources to initiate the VoiceXML interpreter, it returns a 503 (Service unavailable) response. Otherwise, the media server retrieves the VoiceXML document from the server and parses the document for correctness, before proceeding with media negotiation and returning a final response.


1SIP Transport of VoiceXML

• If the media server was unable to retrieve or successfully parse the document, it retursn a 404 (Not found) response.

• The SIP Request-URI delay parameter is measured in units of milliseconds instead of in 100-millisecond increments and can be set to up to 99999 msec.

Request-URIs for the “dialog” Service ContextThe root VoiceXML document is identified using the voicexml parameter of the Request-URI. The value of the parameter must be a valid HTTP URI.

Example 1-1 shows an example of a dialog Request-URI.

Example 1-1 Request-URI for “dialog”

sip:[email protected];voicexml=http://host.company.com/scripts/ivr.vxml

The URL must not exceed 1024 characters. The HTTP URI can include a query component, to allow the document to be dynamically generated by the server.

Note: The query delimiter character (“?”) must be escaped as “%3f,” since “‘?” is a reserved character within a SIP URI. Similarly, when not used in a value equals context, the equals sign (“=”) must be escaped as “%3d.” In general, to determine the equivalent escaped characters for Linux or Solaris, search for the character in question, then replace its ASCII value with its equivalent hex value preceded by a “%”.

Example 1-2 shows an example of a VoiceXML dialog Request-URI containing a query string passing multiple parameters.

Example 1-2 Request-URI for “dialog” with Query

Original Query:sip:[email protected];voicexml=http://host.company.com/scripts/

ivr?caller=usera&callee=userb

Send to Media Server as:sip:[email protected];voicexml=http://host.company.com/scripts/

ivr%3fcaller%3dusera&callee%3duserb

When the document is expressed as a stand-alone URI, the voicexml keyword should be omitted. The SIP Request-URI should remain otherwise unchanged from that shown in Example 3-7.


VoiceXML Overview1

The media server passes only one URI parameter from the Request-URI to the VoiceXML interpreter. If additional URI request parameters are included in the Request-URI, the media server treats the Request-URI as a bad request.

The HTTP URI can include a query component, instead of a straight request for a specific VoiceXML document. This allows the server to dynamically select a VoiceXML script for execution. This allows such features as invoking scripts on the basis of called numbers, for instance.

Example 1-3 shows a SIP INVITE that dynamically selects a script according to the number that was dialled. In this example, the DialledNumber parameter is sent to the HTTP server as a URL-encoded request to fetch the VoiceXML document. This request dynamically generates the associated script and fetches it for the media server to execute.

Example 1-3 SIP INVITE with Query in URI

INVITE sip:[email protected];voicexml=http://10.10.10.53/scripts/cgi-bin/vmail?DialledNumber=6048081234

Example 1-4 shows a SIP dialog that dynamically selects a script based on the callers in the session.

Example 1-4 SIP Dialog with Query in URI

sip:[email protected];voicexml=http://host.company.com/scripts /ivr?caller=usera&callee=userb

Passing Variables to the VoiceXML InterpreterVoiceXML defines a standard set of read-only variables (session variables) that the media server initializes whenever a VoiceXML session is invoked from SIP. These session variables are passed through the SIP INVITE request to the VoiceXML script and are automatically declared for use within the VoiceXML script. Some of these variables get their values from SIP headers and others get their values from parameters of the Request-URI. A VoiceXML application can use this information to control or customize its dialog flow.

VoiceXML scripts can be passed standard session variables or application session variables.

Standard Session VariablesScript variables declared from standard SIP session information are prefixed with session.connection.

For example, consider the VoiceXML script, ivr.vxml. When invoked by a SIP INVITE request, this script results in the following VoiceXML session:

sip:[email protected];voicexml=http://host.company.com/scripts/ivr.vxml



This VoiceXML script has access to the following session variables. The values are all derived from header fields within the SIP INVITE request:

Application Session VariablesIn addition to the standard session variables, which are automatically passed through information found contained in the SIP session information, application-specific session variables can be explicitly passed in the URI parameters of the initial SIP INVITE. The media server supports two methods of creating VoiceXML session variables from the Request-URI.

The first method defines new variables under the session.user tree. In this method, script variables declared from standard SIP session information are prefixed with session.user. For example, a Request-URI of the following:

sip:[email protected];voicexml=http://host.company.com/script.vxml/x=y

would create a session variable named session.user.x with a value of “y”.

For example, the following SIP INVITE request invokes the ivr.vxml VoiceXML script:sip:[email protected];voicexml=http://host.company.com/ scripts/ivr.vxml;appvara=786;appvarmsg=hi there; appvarnumber=604-555-1234

In this request, values for application-specific variables appvara, appvarmsg, and appvarnumber are explicitly passed to the ivr.vxml script. Within the context of the script, these variables are defined as follows:

The second method uses two session variable arrays to hold all URI parameters (including those defined for SIP in RFC 3261) and values.

The first session variable array:session.connection.protocol.sip.parameter[N].name

contains the names of all URI parameters. The first array element [0] always contains the string “voicexml” (regardless of where it appears in the SIP Request-URI) since that is the first and only required URI parameter for the dialog service context. The second array element contains the second URI parameter (if present), and so on.

The second session variable array:session.connection.protocol.sip.parameter[N].value

session.connection.local.uri Derived from the “To:” header field

session.connection.remote.uri Derived from the “From:” header field

session.connection.callid Derived from the “CallId:” header field

session.user.appvara Value: “786”

session.user.appvarmsg Value: “hi there”

session.user.appvar.number Value: “604-555-1234”


VoiceXML Overview1

contains the corresponding values for the URI parameters. For example, a Request-URI of:sip:[email protected];voicexml=http://server.example.com/script.vxml;x=y

populates the [0] and [1] array elements as follows:session.connection.protocol.sip.parameter[0].name=voicexmlsession.connection.protocol.sip.parameter[1].name=xsession.connection.protocol.sip.parameter[0].value=http://server.example.com/script.vxmlsession.connection.protocol.sip.parameter[1].value=y

Any escaped characters in the SIP Request-URI that are used as the name or value of VoiceXML session variables will be replaced with their unescaped representation. For example, a Request-URI of:

sip:[email protected];voicexml=http://server.example.com/script.vxml;%78=%79

creates and populates session variables exactly the same as in the preceding examples.

Terminating VoiceXML DialogsA dialog can complete in either of two ways:

• The VoiceXML interpreter encounters a <disconnect> element or an error. In this case, the media server issues a BYE to end the connection and will delete the call state information when the response to the BYE is received (or times out).

• Processing by the VoiceXML interpreter finishes for any other reason. Examples of are that the VoiceXML interpreter encounters a <return> element in the root document or it encounters an <exit> element. In this case, the media server waits for the control agent to issue a BYE.

Sample VoiceXML Call FlowFigure 1-1 shows the call flow for a VoiceXML session with one user. The call flow is abridged, showing only interactions which impact the media server, not the previous or subsequent interactions between the endpoint and the control agent. Optional flows are shown in grey with dashed lines.



Figure 1-1 VoiceXML Call Flow

In this call flow, the following occurs:

1 The control agent initiates the call to the media server with an INVITE request on the first leg. The Request-URI contains a voicexml= value which specifies the URI for the root document on an external server.

Note: The media server does not support VoiceXML 2.0 scripts in INVITE message bodies.

2 The media server sends a GET request with the HTTP URI to the server to retrieve the initial VoiceXML document.

3 The external server responds with a 200 message to the media server and sends the document.

4 The media server responds with a 200 message to the control agent.

5 The control agent then sends an ACK to the media server indicating that the RTP connection is ready.

6 Upon receipt of the ACK, the media server sends the appropriate audio prompts to the user.

7 The user should reply with a DTMF input. This may trigger other dialogs to be acquired and sent.

8 When the IVR dialog ends or the user terminates his or her connection, the session is terminated by either the media server, which sends a BYE to the control agent (or vice versa).

INVITE

ACK

RTP

Media ServerControl AgentUser A

BYE}

Optional additional server interactions to submit results and/or fetch other documents, grammars, or audio clips

Server

200

HTTP GET

200

May originate from either the CA or the media server

HTTP GET

200


VoiceXML Overview1

VoiceXML Interaction with HTTP Servers

The media server communicates with external HTTP servers using HTTP v.1.1.

During a typical IVR dialog with a user, the collected information from the user is sent to an HTTP server using the <submit> element.

HTTP Server-Side LogicWhen a <submit> element is sent to an HTTP server, the HTTP server-side application is required to generate and return a well-formed VoiceXML document. (This may be as simple as invoking <disconnect> or <exit> if the dialog with the user is to be terminated.) Once the media server receives the fetched VoiceXML document, control transitions to the new document.

Example 1-5 shows a sample <submit> request, and the associated invoked server-side logic that returns a new VoiceXML document, to which control will be transitioned. This logic would be enclosed in a Perl script (.pl file). This example does the following:

1 Collects the data submitted by the media server

2 Creates a file called POSTDATA in the same directory from which the script is executed

3 Sends a VoiceXML document (script) back

4 Sends a <disconnect> element, which “hangs up” the call, freeing port resources and sending a BYE to the SIP control agent

Example 1-5 Example Server-Side Perl Script

#!/usr/bin/perl# vxml requesthandler perl script$input_st = "";if ($ENV{'REQUEST_METHOD'} eq "POST"){

$input_st = <STDIN>;}else{

$input_st = $ENV{'QUERY_STRING'};} print “Content-type: text/html\n\n”; print '<?xml version=“1.0”?>';print “\n”;print '<vxml version=“2.0” xmlns="http://www.w3.org/2001/vxml">';print "\n\n";print '<form>';print "\n";


1VoiceXML Interaction with HTTP Servers

print '<block>';print "\n";print '<disconnect/>';print "\n";print '</block>';print "\n";print '</form>';print "\n";print '</vxml>';print "\n"; if ($input_st ne ""){

open (outfile,">POSTDATA.DAT");print (outfile $input_st);close (outfile);

}

HTTP CookiesThe media server supports HTTP cookies in VoiceXML HTTP transport, as defined in the Netscape Persistent Client State HTTP Cookies specification [13]. A server returning a document or other HTTP object to a client can include a cookie, which contains state information plus the range of URIs to which that state information applies. The client stores the information in the cookie, and for any future HTTP requests to the server falling within that URI range, the client will transmit the state information along with the request. This allows the HTTP server to maintain state for a VoiceXML session.

The media server permits or denies the use of VoiceXML cookies using a configuration parameter in the VoiceXML configuration file vxml.cfg. The media server generates a VoiceXML log message when it receives a request to set a cookie and the OAMP configuration parameter has been set to deny cookies. By default, cookies are enabled on the media server.

Cookies are deleted when the associated SIP session expires, regardless of any expiration time specified. The media server supports a maximum cookie size (that is, NAME=VALUE combination) of 4096 bytes. The media server silently discards any cookies over the maximum.

The media server will support up to 10 cookies per session. After the system maximum is reached, the media server deletes the least recently used cookie when a new cookie is created.

Set-Cookie Response HeaderThe server introduces the cookie to the client using the Set-Cookie header as part of an HTTP response. The Set-Cookie response header has the following syntax:


VoiceXML Overview1

Set-Cookie: NAME=VALUE; expires=DATE; path=PATH, domain=DOMAIN_NAME

and the following applies:

Attributes are separated by semi-colons. The media server does not support any other cookie attributes.

NAME=VALUE Mandatory. The cookie itself: a simple text string excluding semi-colon, comma, and white space. (If these characters are required, they should be encoded in URL style, for example %20 for space “ ”.)The maximum size of NAME=VALUE combination is 4096 bytes. Cookies larger than the 4Kb maximum are silently discarded.

expires=DATE Optional. A date string that defines the valid lifetime of the cookie. Format is Wdy, DD-Mon-YYYY HH:MM:SS GMT, for example, Friday, 16-Apr-2004 13:00:00:00 GMT. By default, the cookie expires when the user session expires.Note that the media server always deletes cookies when the SIP session that received the cookie terminates, regardless of the value of the expires attribute.

path=PATH Optional. Specifies the subset of URLs to which the cookie applies. The media server considers the path attribute to “path match” the request-URI if the path attribute matches a prefix of the request-URI. If the path attribute is not a prefix of the request-URI, the media server does not return cookies. The default is the path of the request-URL that generated the Set-Cookie response.

domain=DOMAIN_NAME Optional. Specifies the domain for which the cookie is valid. The media server considers the domain attribute to “domain match” the request-host if it matches the tail of the fully qualified domain name of the host. For example, mycorp.com matches both shipping.mycorp.com and service.mycorp.com. If the tail matches, cookie will proceed to “path match” (see below). Domain names must have at least one embedded dot (that is, domains such as “.com” are rejected), and they must “domain match” the request host. In addition, the media server will reject cookies with domain attributes where the request host is a fully qualified domain name of the form HD, where D is the value of the domain attribute, and H contains one or more dots. For example, the media server will reject a cookie from request host x.y.mycorp.com where domain=mycorp.com.The default is the fully qualified domain name of the server that generated the cookie.


1ASR and TTS

The Netscape specification includes the secure attribute, as follows:

However, the media server does not support HTTPS at this time. Any cookies specifying the secure attribute are ignored.

The following is an example of a Set-Cookie header in an HTTP response:

Set-Cookie: SESSION_ID=10725; path=/; expires=Friday, 16-Apr-2004 13:00:00:00 GMT

The media server supports lists of cookies in the Set-Cookie header. Cookies are separated in a list by commas (“,”). In addition, the media server accepts multiple Set-Cookie headers within a single HTTP response.

Cookies are uniquely identified by the combination of domain-path-name. So long as cookies have different path or domain attributes, they can have the same name. If the media server receives a cookie with the same domain-path-name as an existing cookie, it overwrites the old cookie. If the media server receives a cookie with the same domain-path-name as an expired cookie, it deletes the cookie.

Cookie Request HeaderThe media server returns cookies to a server when requesting a Request-URI from a request-host. The cookies are sent by including the Cookie header in the HTTP request. If multiple cookies were introduced by the server, the media server returns all cookies where:

• The domain attribute of the cookie “domain matches” the fully qualified domain name of the host, AND

• The path attribute of the cookie “path matches” the Request-URI.

Multiple cookies returned in the Cookie header are separated by semi-colons. When multiple cookies are returned, the media server orders cookies with more-specific path mappings before cookies with less-specific path mappings. Cookie headers sent by the media server do not include any cookie attributes.

ASR and TTS

The media server supports Automatic Speech Recognition (ASR) and Text to Speech (TTS) if there are external servers defined for processing voice input (ASR) or synthesizing speech (TTS). If at the start of a session no ASR servers are defined then all ASR grammars defined in scripts within the session are ignored,

secure If a cookie is marked as secure, it is sent only if the communication channel with the host is secure (that is, the channel is over HTTPS). If secure is not specified, the cookies is sent over the network in clear text.The default is to allow communication over an insecure channel.


VoiceXML Overview1

regardless of the input mode defined. Similarly, if no TTS servers are definedthen throughout the entire session TTS strings found within VoiceXML scripts are ignored. If one or more servers is enabled and brought online, subsequent new sessions will be able to utilize these servers; however, existing sessions will not.

User Input

In the VoiceXML applications supported by the media server, user input comes in the form of DTMF key presses or speech utterances. The way in which user input is collected, buffered and validated varies based on whether it is DTMF or speech input.

• For DTMF, all processing of digits is handled within the media server: digits are matched against active grammars according to the Form Interpretation Algorithm (FIA) defined by [13].

• For speech, voice processing is performed by external speech servers. The validation of the speech against the grammar and determination of whether the speech matches or not is determined by these external servers. The results of the collection are returned to the interpreter as NLSML [8] scripts which are then processed by the media server.

DTMFWithin the media server, DTMF digits are detected on received RTP streams (inband DTMF). The media server also recognizes out-of-band (RFC 2833) DTMF digits.

SpeechSpeech detection processing is performed by the external speech server. Once the media server determines that a grammar defined in a VoiceXML script is a speech grammar, the media server sends the grammar to the speech server and performs no other processing on the grammar: the speech server assumes responsibility for detecting and processing the user input.

Voice input is received from the user’s RTP stream and routed to the speech server. The speech server makes all determinations of whether more input is required or whether the current input produces a match or no-match event. The result of the collection is returned to the media server as an NLSML script. The media server then interprets the results of the collection and determines the next action based on the FIA.

System Output

The output of VoiceXML applications is either the playback of recorded audio or video files or synthesized text-to-speech (TTS) played to the user by the external speech server.

Audio playback to the user is invoked using the <audio> element. Audio files may be stored internally in the media server or on an external HTTP server. In either case, the source of the audio file is specified as a URI.


1Control Flow

Audio files may be stored internally on the media server or on an external HTTP or NFS server. In either case, the source to the audio file is specified as a URI. The URI can be explicitly specified, or specified as the evaluation of an ECMAScript expression. The latter mechanism allows playing of audio files based on application-defined logic.

Different methods of specifying audio files are described in detail in the section “Working with Media Files and TTS Strings” on page 28.

If desired, the VoiceXML application can allow the user to interrupt (“barge”) audio playback with a DTMF key press, by enabling the bargein attribute of the <audio> element.

TTS clips are specified by embedding strings into VoiceXML scripts. The media server supports plain text strings, Speech Synthesis Markup Language (SSML) strings, or a combination of the two. All strings, however, specified, are converted to SSML strings which are subsequently passed to an external speech server.

Control Flow

Control flow within a VoiceXML application can be manipulated using any of the following mechanisms:

• Application-defined variables—for example, using variables defined using <var>, <assign>, or <clear>

• Predefined system variables—for example, using variables defined using <var>, <assign>, or <clear>

• Event generation and handling—for example, using <throw>, <catch>, <error>, <help>, <noinput>, <nomatch>, or user-defined events

• Conditional execution—that is, using <if>, <else>, and <elseif>

• Control transfer and jumps—for example, using <goto>, <subdialog>, <submit>, <exit>, <return>, and <disconnect>

• Prompts—for example, using <prompt> and <reprompt>

• Scripts—that is, using ECMAScript, either embedded inline or externally fetched

Session Termination

A session terminates either from the execution of a <disconnect> element within a script, because a user hangs up, or because a fatal error occurs during the execution of a script. Regardless of the cause, when a session terminates, the script enters into a state in which the set of operations that can be executed is restricted, so that script can clean up resources (for example, post the current state of collections, recordings, and so on to the HTTP server) before the session terminates.

A script that is terminating is not able to queue or play prompts, recordings, or collect DTMF. There is a limit of two HTTP access operations and a maximum of six iterations—that is, the script can move between forms and other scripts a maximum of six times before the session is terminated. These restrictions are intended to prevent unnecessary processing while in this “clean-up” state.


VoiceXML Overview1

Shadow Variables

Some VoiceXML elements have associated shadow variables. Shadow variables are variables that are automatically assigned values when the elements are used. The media server supports shadow variables for the following:

• Announcements. Shadow variables for announcements are provided through the <prompt> element. These provide information about prompt completion and information resulting from DTMF collection. For information about shadow variables for announcements, please see the “Shadow Variables” section of the <prompt> element.

Note that if the session terminates as the result of a SIP BYE, the shadow variables will not be updated with information about the prompt they will not contain correct values.

• Recordings. Shadow variables for recordings are provided through the <record> element. These provide information about the duration of the recording and the reason for its termination. For information about shadow variables for announcements, please see the “Shadow Variables” section of the <record> element.

Shadow variables cannot be modified by a user or an application. They are returned from a VoiceXML document.

Supported shadow variables are summarized in the “List of Shadow Variables” on page xiii.

Events

Some events and errors are automatically generated by the media server; others are generated under direct control of the VoiceXML application. For each event type or error type, the VoiceXML application can specify specific handling. Some event and error handling has a predetermined default implementation provided by the media server. In most cases, the system default event or error handlers can be overridden by the VoiceXML application to provide a more tailored mechanism.


1Events

Table 1-5 shows media server support for VoiceXML events.

Table 1-5 Event Support

Event Description

connection.disconnect Supported. Thrown whenever a disconnect distinct from hangup occurs.This event, if caught within an application, allows the application the opportunity to perform final processing before terminating the session. This may include posting of data (such as a recording), or submitting variables to an HTTP server using the <submit>, <goto>, or <link> element.Processing allowed before session termination is restricted to posting information to an HTTP server, setting variable values, and executing simple if-else-elseif statements. Execution requests to play audio, perform recording, define grammars will not be honored. In addition, a maximum of two HTTP post events are allowed in the catch handler and subsequent VXML documents. Note that this limitation applies equally to events executed within a particular <catch> handler and to subsequent VXML documents returned to the application. All attempts to perform restricted operations are terminated without incident; that is, even though a request to play an audio clip after a disconnect will fail, a second error will not be generated.

connection.disconnect. hangup

Supported. Thrown whenever a disconnect occurs.This event, if caught within an application, allows the application the opportunity to perform final processing before terminating the session. This may include posting of data (such as a recording), or submitting variables to an HTTP server using the <submit>, <goto>, or <link> element.Processing allowed before session termination is restricted to posting information to an HTTP server, setting variable values, and executing simple if-else-elseif statements. Execution requests to play audio, perform recording, define grammars will not be honored. In addition, a maximum of two HTTP post events are allowed in the catch handler and subsequent VXML documents. Note that this limitation applies equally to events executed within a particular <catch> handler and to subsequent VXML documents returned to the application. All attempts to perform restricted operations are terminated without incident; that is, even though a request to play an audio clip after a disconnect will fail, a second error will not be generated. This event is thrown regardless of how the disconnect is initiated—that is, whether the <disconnect> element was executed or encountered, or whether the disconnect was on account of a user hang-up.

com.cvd.event. faxdetect

Supported. Thrown if fax tone detection is enabled (by setting the com.cvd.faxdetect property to true) and a fax tone is detected.

exit Supported. This event will be processed either by the appropriate catch handler, or by the <exit> element. Please see page 88 for details on the <exit> element.

filled Supported. Indicates a match event for DTMF collection. Allows application to define specific behavior relative to DTMF match events. This event is not actually thrown in the sense that it can be caught by a catch handler such as the <catch> element.

help Supported. This event will be processed either by the appropriate catch handler, or by the <help> element. Please see page 100 for details on the <help> element.If an application-specific <help> handler is not defined in the document, the default <help> handler executes 5 times before exiting the session.


VoiceXML Overview1

Errors

All VoiceXML errors are fatal to the current session, and the session terminates in all cases.

Table 1-5 shows VoiceXML errors supported by the media server. Note that unsupported errors are not listed.

noinput Supported. Used to catch no-input events relative to DTMF collection and recording.This events allows an application to override the default <noinput> handler. Please see page 114 for details on the <noinput> element.If an application-specific <noinput> handler is not defined in the document, the default <noinput> handler executes 5 times before exiting the session.

nomatch Supported. Used to catch no-match events relative to DTMF collection. Allows an application to override the default <nomatch> handler. Please see page 115 for details on the <nomatch> element.If an application-specific <nomatch> handler is not defined in the document, the default <nomatch> handler executes 5 times before exiting the session.

cancel Ignored.

connection.disconnect. transfer

Ignored.

maxspeechtimeout Ignored.

Table 1-5 Event Support

Event Description

Table 1-6 Error Support

Error Description

error.badfetch Thrown when specified resource could not be fetched, or the resource was specified incorrectly.

error.grammar Thrown for incorrectly formatted grammars, or unsupported attributes used within a grammar.

error.max_loop_count_exceeded

Thrown if:1. The maximum document fetches set for this session has been

exceeded. Note this includes VXML document fetches for submit, sub-dialog, goto, link and the initial application document. Root documents and external SRGS grammars are not counted for this counter. The default is 100 fetches.

2. If the number of iterations (loops) exceeds 400 for a session. This includes all documents fetches and transitions between forms within a document.

error.noresource Thrown when a request (for example to play a clip or to enable fax detection) is rejected because available resources have been exceeded and overload protection is in effect. Like all VoiceXML errors, this error is fatal and the session will be terminated.


1ECMAScript Support

ECMAScript Support

The media server’s ECMAScript support is fully compliant with ECMA-262, Edition 3 based on JavaScript 1.5.

The length of any variable in VoiceXML, however specified, is limited to 256. In addition to the 256-character maximum enforced by the media server, ECMAScript may apply additional constraints in its own handling of variables. Any string specified longer than 256 or that supported by ECMAScript results in session termination with an error.semantic being thrown, except in the following cases:

• The variable is a URI, Remote or Local Address session variable. These are default session variables available to all applications. The maximum length of a VoiceXML URI that starts a session is 1024. Rather than rejecting the call with an exception, such values are truncated to 256 characters and stored in there in the shortened form.

• A user-defined session variable is longer than 256 characters. In this case, the session terminates, but an exception is not thrown.

All other cases result in an error.semantic and session termination.

Note that the media server does not throw an error.semantic for division-by-0 errors. Instead, the media server returns a value Inf, INF, or inf, representing infinity.

Escape Characters

There are essentially four classifications of data received or sent by the media server in which checking for escape characters (in the form%HEXHEX) may or may not be required—or in which the media server may need to escape characters deemed to be special by the protocol. Thes four types of data are the following:

error.semantic Thrown when incorrect or invalid values are assigned to properties. For information about supported properties, please see Chapter 2: Properties Overview.Also thrown for unsupported or undefined ECMAScript objects; for example when an undefined variable is evaluated. For information about ECMAScript support, please see “ECMAScript Support” on page 27.

error.unsupported. language

Thrown when an unsupported language has been specified for sets and variables.

error.unsupported. element

Thrown when an unsupported element is specified.

Table 1-6 Error Support

Error Description


VoiceXML Overview1

1 The Request-URI representing the initial URI required to start a session.

The media server assumes that this URI has been successfully extracted by the SIP layer. The URI may or may not include escape characters, so the media server processes the URI to remove any escape characters from the string.

2 Session variables appended to the end of the Request-URI.

The session variables are removed by the SIP layer based on the rules as defined in SIP RFC 3261. The session variables are presented in a list and are individually processed, with each escape character being converted into its ASCII equivalent.

3 URIs received within a VoiceXML document.

URIs received within a VoiceXML document are processed by the XML parser, which unescapes all characters based on the rules of XML. Subesquent to this operation, there is no other escape checking required or performed.

4 URIs including namelist data sent from the media server to an HTTP server.

These are URIs compiled by the media server and sent to the HTTP server. All characters in these URIs must be escaped, according to the HTTP protocol, and the media server processes them accordingly.

Working with Media Files and TTS Strings


• Media Clip Support

• Clip Delineation in Prompts

• Referring to Media Files

• HTTP Queries

• Relative URIs

• Sets and Variables

Media Clip SupportTable 1-8 shows the clip formats and encodings (in the format file-format:codec) supported for VoiceXML

Table 1-7 VoiceXML: Supported Media Clips

Storage: Announcement Recording Sets & Vars

Audio Video Audio Video Audio

Internal Indexed

WAV: G.711, G.729

— WAV: G.711, G.729

— WAV: G.711, G.729


1Working with Media Files and TTS Strings

Audio clips must have the following characteristics:

• 8 kHz

• Mono (number of channels is 1)

• 8-bit

Clip Delineation in PromptsClips— whether audio, multimedia, video-only or TTS renderings—are grouped together to form a prompt. The way that clips are queued for prompt depends on the following:

• A bargein attribute associated with the prompt

• A vcrprompt attribute

• The number of clips being queued (a maximum of 50 clips is supported)

• Whether alternate clips are specified

• Whether TTS clips are specified

The media server places the prompt clips into a queue. When the media server reaches a point where the prompt is to be played, it issues the request to play all clips in the group. Only when all clips in the group have been played is the request to play the next group (prompt) issued.

The order of clips played always matches the order specified in the VoiceXML document. All the clips within the same request group necessarily have the same bargein and vcrprompt attributes. Note that the barging of one in a set of clips barges all clips, even those that have been queued but have not yet been requested to be played. This is true even if a subsequent set of clips has the bargein attribute set to false.

Internal Named

WAV: G.711, G.729QT: G.7113GPa: AMR

QT: H.263 3GP: H.263

WAV: G.711, G.729QT: G.7113GP: AMR

QT: H.263 3GP: H.263

WAV: G.711, G.729

NFS WAV: G.711, G.729QT: G.7113GP: AMR

QT: H.263 3GP: H.263

WAV: G.711, G.729QT: G.7113GP: AMR

QT: H.263 3GP: H.263

—

HTTP WAV: G.711, G.729

— WAV: G.711, G.729

— —

a. The software media server, which does not support AMR, does not use 3GP for audio, only for video-only clips.





VoiceXML Overview1

Alternate clips are sent as separate requests and are not included with primary clips. TTS clips are not included with audio or multimedia clips and are handled separately by the media server.

Audio and TTS clips can be grouped together. Clips containing video must be grouped separately: either all the clips must contain video or none of them may. In addition VCR controls are not supported for video clips. If a clip containing video is discovered within a set of clips of another type, or if VCR controls are applied to video clips, the media server reports an error and terminates the playing of any other queued and requested clips. Clips that have been queued but are not yet requested are not affected.

Referring to Media FilesAudio-only, video-only, and multimedia clips are supported in VoiceXML.

Table 1-8 shows the clip formats and encodings (in the format file-format:codec) supported for VoiceXML

Audio clips must have the following characteristics:

• 8 kHz

• Mono (number of channels is 1)

• 8-bit




Internal Indexed

WAV: G.711, G.729

— WAV: G.711, G.729

— WAV: G.711, G.729

Internal Named

WAV: G.711, G.729QT: G.7113GPa: AMR

a. The software media server, which does not support AMR, does not use 3GP for audio, only for video-only clips.

QT: H.263 3GP: H.263

WAV: G.711, G.729QT: G.7113GP: AMR

QT: H.263 3GP: H.263

WAV: G.711, G.729

NFS WAV: G.711, G.729QT: G.7113GP: AMR

QT: H.263 3GP: H.263

WAV: G.711, G.729QT: G.7113GP: AMR

QT: H.263 3GP: H.263

—

HTTP WAV: G.711, G.729

— WAV: G.711, G.729

— —



The following table shows how to specify named media files in a VoiceXML document.

Table 1-9 Referencing Named Media Files in VoiceXML

Identifier Type Syntax

Internal Announcement

Syntax is [file:/]/provisioned/path/filename. Provisioned clips with alphanumeric names can be structured in up to nine levels of hierarchical directories or paths (with the level “/provisioned” forming a tenth level where applicable). Levels are delimited with the slash character (“/”). If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI.

Syntax restrictions are as follows:• Up to 128 characters can be used in total for path/filename.• File names are case-sensitive. File extensions are not case-sensitive.• Numbers, letters, and the underscore character are supported. Slash (“/”) is

supported only to delimit levels of hierarchy. One period (“.”) is supported to delimit the file name from the file extension.

Examples:file://provisioned/audioclips/hello.wav/provisioned/audioclips/hello.wav

Internal Recording Syntax is [file:/]/transient/filename Transient recordings do not support hierarchical paths.If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI.Syntax restrictions are as follows:• Up to 128 characters can be used in total for filename.• Numbers, letters, and the underscore character are supported. One period

(“.”) is supported to delimit the file name from the file extension.Example:

file://transient/user1_name.wav/transient/user1_name.wavfile://transient/intro.mov/transient/intro.QT

NFS server An absolute URL consisting of the file://mnt header (representing the mount point or exported directory) plus a valid NFS URI as per RFC 2224. The syntax is as follows:

[file://]mnt/nfs_server_ip/path/filenamewhere nfs_server_ip is the IP address of the external NFS server, path is the path fragment to be appended to the exported directory, and filename is the media file.

If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI.Syntax restrictions are as follows:• Up to 255 characters can be used in total. • Numbers, letters, and the underscore character are supported. Slash (“/”) is



VoiceXML Overview1

The following table shows how to specify indexed audio files in a VoiceXML document.

Examples:Suppose that:• The IP address of the NFS server is 10.10.4.102• The server is known by the DNS server as gecko• The exported directory is /annc/myclips. To play an audio clip welcome.wav located in /annc/myclips/audioclips, you can specify any of the following URIs:

file://mnt/10.10.4.102/audioclips/welcome.wavmnt/10.10.4.102/audioclips/welcome.wavfile://mnt/gecko/audioclips/welcome.wavmnt/gecko/audioclips/welcome.wav

HTTP server A valid HTTP URL or URI. The syntax is as follows:

[http://]path/filenameIf the http:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI.Syntax restrictions are as follows:• Up to 255 characters can be used in total. • Numbers, letters, and the underscore character are supported. Slash (“/”) is


• Files names can include a valid HTTP query.Examples:

http://10.10.6.213/annc/myclips/audioclips/welcome.wav10.10.6.213/annc/myclips/audioclips/welcome.wavhttp://10.0.0.132/wavs/audio_handler?id=1234&sub=999

Table 1-10 Referencing Indexed Audio Files in VoiceXML


Internal Announcement

Syntax is [file://]index, where index is the numeric index of the clip. The range for indexes is 1–50000. Indexed clips do not support hierarchical paths. If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI.Example:

file://729 729

Table 1-9 Referencing Named Media Files in VoiceXML




Note: When using the VoiceXML interface, avoid using spaces in media file names; instead, encode the space as the escape character “%20”. The media server does accept file names that include spaces, but replaces them with the escape character “%20” before passing the file along for further processing

HTTP QueriesAn HTTP URI can include a query component, instead of a straight request for a specific audio or multi-media resource, as in the following example:

<audio src="http://10.0.0.132/wavs/audio_handler?id=1234&sub=999>

This example allows the server to dynamically select an audio file to play.

Relative URIs

Internal Recording (Index Assigned by CA)

Syntax is [file://]index, where index is the clip index. The range for indexes is 2000001–2025000. Transient recordings do not support hierarchical paths. If the file:// scheme is not included, the file specification is treated as a relative URI, and the current base URI is prepended to it to form an absolute URI.Examples:

file://20009922000992

Internal Recording (Index Assigned by MS)

Syntax is [file://]index, where index is the clip index. Transient recordings do not support hierarchical paths. CMS-9000 or CMS-6000:For an MPC in slot n the range is:5000001+(n*100000) to 5025000+(n*100000)Example:For an MPC in slot 2 the range is 5200001 to 5225000:

file://52009925200992

For an MPC in slot 5 the range is 5500001 to 5525000:file:///5500729 file://5500729 5500729

CMS-3000 or CMS-1000:The range is 5200001 to 5225000.Example:

file://52009925200992

Table 1-10 Referencing Indexed Audio Files in VoiceXML



VoiceXML Overview1

VoiceXML documents are always stored on HTTP servers. References to VoiceXML documents are similar to those for clips stored on external HTTP servers. VoiceXML documents can be referenced by either an absolute URI or a relative URI. A relative URI is recognized by the absence of the protocol (http:// or file://) scheme before the path fragment and file specification.

The media server converts a relative URI to an absolute URI by concatenating with a “base” URI, which is either the URI of the fetching document or the value declared by using the xml:base attribute. A declared value takes precedence over the URI of the fetching document. The declaration can be made in multiple documents; the innermost declaration takes precedence.

For example, suppose a VoiceXML document has a base URI is http://server2/path1/path2. Then consider the document reference in Example 1-6:

Example 1-6 Relative URI

<goto next="record.vxml">

This reference is a URI fragment. Accordingly, the VoiceXML interpreter considers it to be a relative URI. Thus, the base URI is concatenated to record.vxml, resulting in an HTTP GET to http://server2/path1/path2/ record.vxml.

In accordance with the precedence rules for determining the base URL, within record.vxml (that is, while record.vxml is executing), the following applies:

• If record.xml itself has xml:base specified, the value of xml:base is used as the base URI while record.xml is executing. In that case, the base URI of the calling document is ignored.

• If xml:base is not specified, the base URI for record.xml is the URI that was used to fetch record.xml. In this case, that is http://server2/path1/path2.

In Example 1-7, suppose a VoiceXML document again has a base URI is http://server2/path1/path2. Then consider the following document reference:

Example 1-7 Absolute URI

<goto next="http://newserver/path1/path2/path3/record.vxml">

This reference is to an absolute URI. The base URI is bypassed, resulting in an HTTP GET to http://newserver/path1/path2/path3/record.vxml.

As in Example 1-6, within record.vxml, the following applies:

• If record.xml itself has xml:base specified, the value of xml:base is used as the base URI while record.xml is executing. In that case, the base URI of the calling document is ignored.

• If xml:base is not specified, the base URI for is the URI that was used to fetch record.xml. In this case, that is http://newserver/path1/path2/path3.



Sets and VariablesThe media server supports dynamically rendering announcements from sets of clips, or by using clips referenced by variables. A set is a provisioned collection of audio clips together with an associated selector type and value. The selector is used to identify a specific physical audio clip within the set. A variable represents a semantic concept (such as date or number) which the media server uses to dynamically construct the appropriate audio segment.

The media server’s sets and variables feature is described in detail in the Convedia Media Server Sets and Variables Interface Reference Guide. That document describes how to create an audio segment configuration file for using sets and variables, and how to install it. It also describes the media server’s support for implemented languages.

In SIP-controlled media servers, sets and variables are available for MSML and VoiceXML. For more information about using sets and variables on the media server, please see the Convedia Media Server Sets and Variables Interface Reference Guide.

The VoiceXML interface supports a subset of the media server’s full sets and variables capability. Variables can be included in audio prompts using the <prompt> element. A variable is included by embedding the <say-as> element within the <prompt> element. The variable value itself is specified using a <value> element within the <say-as> element, or by a plain text string within the <say-as> element.

The language in which the variable is to be rendered is indicated using the xml:lang attribute, either at the document level (that is, within the <vxml> element) or within the <prompt> element. Currently, only English (en) is supported.

For more information on VoiceXML support for sets and variables, please see the <vxml> element, the <prompt> element, the <say-as> element, and the <value> element. For full details on the media server’s sets and variables feature, please see the Convedia Media Server Sets and Variables Interface Reference Guide.

•


VoiceXML Overview1


2

This chapter describes the media server’s support for VoiceXML properties.

This chapter presents the following information:

• Properties Overview

• Generic Speech Recognizer Properties

• Generic DTMF Recognizer Properties

• Prompt Properties

• Fetching Properties

• Fax Detection Property

Chapter 2: VOICEXML PROPERTIES


VoiceXML Properties2

Properties Overview

Properties are variable settings that can be used to affect the behavior of the VoiceXML interpreter, such as DTMF recognition, timeout intervals, caching policy, and so on. VoiceXML properties are set using the <property> element.

In some cases, global properties can be overridden using an attributes. For example, the bargein property can be updated by setting the bargein attribute in the <prompt> element. When values are not specifically assigned, properties inherit the platform defaults defined in this chapter.

Any malformed property will result in an error.semantic exception being thrown and session termination.

Table 2-1 summarizes the media server’s support for VoiceXML properties

Generic Speech Recognizer Properties

There are a number of properties that are activated as part of support for voice-based grammars. These property values are passed to the specified ASR speech recognizer when a grammar is activated through the equivalent MRCP header field. If a value is not specified, default values are used for these properties.

Table 2-1 Property Support Summary

Property Class Example Reference

Generic DTMF Recognizer Properties Interdigit timeout values Page 41

Prompt Properties Prompt barge-in (interrupt) Page 43

Fetching Properties Fetch timeout value for retrieving documents, grammars, and scripts

Page 43

Fax Detection Property CED fax tone. Page 46

Generic Speech Recognizer Properties

inputmodes= “voice” Page 38

Object Fetching Properties N/A Ignored


2Generic Speech Recognizer Properties

These values can be configured on the external speech server. Alternatively, the media server can be configured to set these values through the control protocol by configuring the media server through the management interface. If VoiceXML is the control protocol, these values are set through the properties described in the following table. If MSML 1.1 is the control protocol, the default value listed for the property is set. Table 2-2 shows the mappings between VoiceXML properties and their equivalent MRCP header fields.

Table 2-2 MRCP Speech Recognizer Properties Support

Property Description MRCP Equivalent

confidencelevel The required confidence level of the speech recognition. If the confidence level returned by the speech server falls below this value, a nomatch event is returned. The range is 0.0 to 1.0, where 0.0 means minimum confidence is needed and 1.0 means maximum confidence is needed. The default is 0.5.

confidence-threshold

sensitivity The sensitivity level of the speech recognizer to voice input. The range is 0.0 to 1.0, where 0.0 is least sensitive and 1.0 is most sensitive. The default is 0.5.

sensitivity-level

speedvsaccuracy A hint specifying the desired balance between speed vs. accuracy in recognition. The range is 0.0 to 1.0, where 0.0 is fastest and 1.0 is best accuracy. The default is 0.5.

speed-vs-accuracy

completetimeout The amount of time the speech recognizer should wait after a speech input has been recognized and is considered a match before returning a match. Reasonable values are in the range of 0.3s to 1.0s. The default is 1.0s.

speech-complete-timeout

incompletetimeout The amount of silence after receiving speech for the speech recognizer to wait before it finalizes the result. This timer applies only where the speech received does not so far match any current active grammar. This parameter also applies when the received speech matches an active grammar but where it is possible to speak further and match another grammar. The default is 1.0s.

speech-incomplete-timeout

timeout The amount of time in seconds or milliseconds the speech recognizer should wait for speech to start. The default is 10s.

no-input-timeout

(No VoiceXML equivalent)

An MRCP property that has no equivalent in VoiceXML or MSML. Specifies the total time this recognition may take to complete. This is from, in effect, the start of the speech event through to the completion of recognition, as long as there is continuous voice being recognized. If the timer expires before the accumulated speech produces a match, a nomatch event is returned.

recognition-timeout

fetchtimeout The amount of time a speech server should wait to retrieve an ASR grammar, as well as the time the media server waits to retrieve a VoiceXML script. The application should be aware that this property value applies to two entities: general document fetches and MRCP document fetches.The value is an interval in seconds in the format <number>s. The default value is 50s.

fetch-timeout



In addition to properties that have specific mappings to MRCP header fields, the media server implements the following properties to support voice:

Note that the externalserver property is external to the VoiceXML specification. It is defined to support applications that may want to access a specific server or set of servers using a load balancer based on server capabilities or ownership of servers. Limitations on this value are as follows:

• The current value of the property is included in all set-up requests. This includes ASR servers only; there is no equivalent for accessing TTS servers. The property applies to only one type of server. If set after the ASR connection has been established then this element has no effect.

• The media server does not validate the value specified in the property.

(No VoiceXML equivalent)

A Boolean value passed in the MRCP Recognize command that specifies whether the no-input-timeout timer should be started or not. For cases where a clip is being played prior to a collection, this value should be set to false. This value is set to false if there are concurrent announcements being played AND the announcement is bargeable; otherwise, it is set to true.For this MRCP header field it is assumed that any time a prompt is to be played prior to a voice collection this value will be set to false. If the prompt completes without being barged then the media server sends a RECOGNITION-START-TIMERS request to the Speech Server to start the timers running.

Start-Input-Timers (MRCP v2)Recognizer-Start-Timers (MRCP v1)

maxnbest In VoiceXML, this property controls the size of the application.lastresult$ array. This array holds the various possible values for a specified collection. This field allows that more that one alternative for a speech collection to be returned. The values returned are stored in the associated indices of the application.lastresult$ array.

n-best-list-length

Table 2-2 MRCP Speech Recognizer Properties Support

Property Description MRCP Equivalent

Table 2-3 General Speech Property Elements

Property Description

inputmodes Specifies the input mode that is currently active within the defined scope. Supported values are as follows:dtmf: DTMF only is accepted as input.voice: Voice only is accepted as input.dtmf voice: Both DTMF and voice are accepted as input.The default value for this property is configured in the media server’s management interface. Note that these values are case-sensitive.

externalserver A RadiSys extension. An application-defined property that can be used to specify the external ASR server for the current call. Limitations on this value are outlined below. The value is intended to match the External Server name as defined in the management interface and applies only to ASR servers.


2Generic DTMF Recognizer Properties

• If the external server is specified and does not map to an existing server, the service request fails.

Generic DTMF Recognizer Properties

Table 2-4 shows media server support for generic DTMF recognizer properties.

Table 2-4 Generic DTMF Recognizer Property Support


timeout The interval after which, if DTMF or speech user input is not received, a noinput event is thrown. The value of this property can be overridden locally by setting the timeout attribute of the <prompt> element.The timer for this property starts whenever the media server transitions into a digit collection state or speech and no DTMF digits have yet been received (that is, there are no digits in the digit buffer) or no speech has been received. The timer stops on receipt of the first DTMF event or speech utterance, after which the inter-digit timeout timer takes effect. When used in a play-collect operation (within a <prompt> element), this value applies to the interval beginning when the prompt audio clip is queued, and not when the clip is played.The default is 10s (10 seconds).

interdigittimeout The interval between DTMF digits after which, if exceeded, a nomatch event is thrown. The timer for this property starts after receipt of the first DTMF digit, and is reset each time a DTMF digit is received, until a match is achieved or the media server determines that a match is impossible.• If a match is achieved, then the behavior depends on the value set for

the termtimeout property.• If the media server determines that a match is impossible, then a

nomatch event is thrown.The default is 4s (4 seconds).

termchar Specifies a single DTMF digit which, if detected prior to a inter-digit timeout, terminates DTMF collection. The termination key is not included in the resulting user input. A value of null indicates that no DTMF termination key is defined.By default, the pound sign (“#”) terminates DTMF input.



Timeout vs. Interdigit Timeout PropertiesThe timeout and interdigittimeout values are both associated with DTMF collections and can be set as properties anywhere within a VXML document where properties are allowed to be set. (Note that the timeout property is also associated with pre-speech timeouts for recordings. However, the discussion in this section considers only its use within DTMF collections, including play-collect sequences; its use and mappings for speech collection is not addressed here.

Both the timeout and interdigittimeout values are associated with a digit collection and are not explicitly associated with the playing of audio clips. The timeout property starts whenever the media server transitions into a digit collection state AND there are currently no digits in the digit buffer. It stops when the first DTMF event is received. If it expires, a noinput event is thrown.

The interdigittimeout property is reset prior to all DTMF collections. Note that resetting the timer both updates its value as well as ensures that it is started. When the media server transitions into a digit collection state, it resets this timer only after receiving the first digit, since if there are no digits the timeout timer is run. The inter-digit timer (IDT) is started and stopped as follows:

• Requests to play announcements stop the IDT. It is restarted only if an explicit request to start it is made or a digit is received. For the latter case, the timer value is whatever was previously set. Thus, after an announcement completes, the IDT timer is not running.

• The IDT is stopped (if running) and restarted whenever a digit is received.

termtimeout The interval the media server will wait when all digits expected according to the active grammar have been collected. • For fixed-length grammars, this is when exactly the number of specified

digits have been collected. • For variable-length grammars, this is when the maximum number of dig-

its for the grammar have been collected.Until the expected number of digits has been received, the media server waits for the next expected event. This may be either:1. The next digit, or2. The defined termchar, or3. An inter-digit timeout.Once the expected number of digits has been received, then the behavior depends on whether the termtimeout property has been set. • If the termtimeout is NOT set (that is, when it is set to 0s, which is the

default), collection immediately terminates.• If termtimeout is set, the media server waits for the specified timeout

interval before terminating collection. Setting a value for termtimeout allows (for instance) the possibility of matches to more than one grammar, of increasing specificity, so that the most specific grammar possible can be matched.The default is 0s (0 seconds).

longdigitduration Ignored.

Table 2-4 Generic DTMF Recognizer Property Support



2Prompt Properties

• The IDT is stopped and restarted with the new value whenever a reset timer event is issued by the media server.

• Once the IDT expires, it does not restart until either a digit is received or it is explicitly requested to run, through a reset timer event.

Prompt Properties

Table 2-5 shows media server support for prompt properties.

Fetching Properties

fetchhint PropertiesIn the VoiceXML specification, the fetchhint property defines when the interpreter context should retrieve the corresponding content from the server. A value of prefetch indicates that a file is to be downloaded when the page is loaded. A value of safe indicates that a file is only to be downloaded when actually needed.

Table 2-5 Prompt Property Support


bargein Specifies whether an audio prompt or TTS rendering can be interrupted (“barged”) by DTMF input. The value of this property can be overridden locally by setting the bargein attribute of the <prompt> element.Supported values are as follows:true: The prompt is bargeable, and DTMF input will interrupt play or TTS rendering. If any digits remain in the digit buffer at the time this element is executed, the clip is barged immediately and will not play. false: The prompt is not bargeable. Any digits currently in the digit buffer are cleared, and any digits received while clip(s) are playing are discarded. The bargein property can interact with the cvd:cleardb attribute set in the <prompt> element, and in that case the behavior will vary depending on the contents of the digit buffer. For more information, please see the “Usage Guidelines” for the <prompt> element. The <prompt> element is documented beginning on page 122.

bargeintype Ignored.



The VoiceXML default value for fetchhint properties is prefetch. However, prefetching is not supported on the the media server, and it always behaves as if the value is safe.

maxage PropertiesIn the VoiceXML specification, maxage properties ensure that the type of document the property governs does not use content whose age is greater than specified. These maxage properties are used in conjunction with corresponding maxstale properties to determine document fetching behavior.

The maxage property is not supported. In general, the media server checks the date of the content on the server and fetches the content if it is newer than that on the media server.

maxstale PropertiesIn the VoiceXML specification, maxstale properties indicate that the document is willing to use content that has exceeded its expiration time. These maxstale properties are used in conjunction with corresponding maxage properties to determine document fetching behavior.

Table 2-6 fetchhint Property Support


fetchhint Not supported. The media server always downloads a document only when actually needed (equivalent to a value of safe).

audiofetchhint Ignored.

documentfetchhint Ignored.

grammarfetchhint Ignored.

scriptfetchhint Ignored.

Table 2-7 maxage Property Support


maxage Not supported.

audiomaxage Ignored.

documentmaxage Ignored.

grammarmaxage Ignored.

scriptmaxage Ignored.


2Fetching Properties

The maxstale property is not supported, as shown in Table 2-6.

Other Fetch PropertiesThe media server does not support other fetch properties, as shown in Table 2-9.

Object Fetch PropertiesThe media server does not support object fetch properties, as shown in Table 2-10.

Table 2-8 maxstale Property Support


maxstale Not supported.

audiomaxstale Ignored.

documentmaxstale Ignored.

grammarmaxstale Ignored.

Table 2-9 Support for Other Fetch Properties


fetchaudio Ignored.

fetchaudiodelay Ignored.

fetchaudiominimum Ignored.

fetchtimeout Supported for VoiceXML scripts but not for audio..

Table 2-10 Support for Object Fetch Properties


objectfetch Ignored.

objectfetchhint Ignored.

objectmaxage Ignored.

objectmaxstale Ignored.



Fax Detection Property

Table 2-11 shows media server support for the fax detection property. The fax detection property is a RadiSys extension, and is specific to the Convedia Media Server.

Table 2-12 shows the interaction between the bargein property (or attribute) and the fax detection setting with respect to audio announcements.

Table 2-11 Fax Detection Property Support


com.cvd.faxdetect Enables or disables fax tone detection. Supported values are as follows:true: Fax tones are detected if they occur, and a com.cvd.event.faxdetect event is thrown. false: Fax tones are ignored.If enabled within the scope of an element playing an announcement, collecting digits, or recording audio, the fax tone will interrupt the operation. Any associated shadow variables will be updated prior to the com.cvd.event.faxdetect event being thrown, as follows:• If the fax tone interrupts a digit collection, the application.cvd_

lastresult$.termcond shadow variable is set to FAX.• If the fax tone interrupts a recording, the name$.termchar shadow vari-

able is set to F.The value of this property depends on the value set within the element currently being executed; that is, this property adheres to basic VoiceXML scoping rules. In addition, the setting is applied when the particular request is executed. For announcements, this means that fax detection is enabled or disabled when the associated audio clip is played—not when it is queued.By default, fax tone detection is disabled when a session is initiated.

Table 2-12 Interaction of “bargein” and Fax Tone Detection

bargein Fax Tone Detection Behavior

True Disabled Any DTMF digit will interrupt the announcement, but a fax tone will not. Fax tones are ignored.

True Enabled Any DTMF digit will interrupt the announcement, and a fax tone will also interrupt the announcement.

False Disabled The announcement is not interruptible: neither DTMF digits nor a fax tone will interrupt the announcement.

False Enabled No DTMF digit will interrupt the announcement; DTMF digits are ignored. However, a fax tone will interrupt the announcement.


2Fax Detection Property

Table 2-13 shows the interaction between the dtmfterm attribute of the <record> element and the fax detection setting with respect to recordings.

Table 2-13 Interaction of “dtmfterm” and Fax Tone Detection

dtmfterm Fax Tone Detection Behavior

True Disabled Any DTMF digit will interrupt the recording, but a fax tone will not. Fax tones are ignored.

True Enabled Any DTMF digit will interrupt the recording, and a fax tone will also interrupt the recording.

False Disabled The recording is not interruptible: neither DTMF digits nor a fax tone will interrupt the recording.

False Enabled No DTMF digit will interrupt the recording; DTMF digits are ignored. However, a fax tone will interrupt the recording.




3

This chapter describes the media server’s support for DTMF and voice grammars in VoiceXML.

The following information is presented:

• Overview

• Input Mode

• Menu-Choice Grammars

• Option Grammars

• SRGS Grammars

• Arbitrary Grammars

• Built-In Grammars

• Maximum Length of Grammars

• Input Mode

Chapter 3: DTMF AND VOICE GRAMMARS


DTMF and Voice Grammars3

Overview


• DTMF Grammars

• Speech Grammars

The grammar definitions of a VoiceXML application provide a standard mechanism of validating user input. The grammar defines a set of rules, which are applied to user input to validate it. User input can take the form of either DTMF key presses or of speech (voice) utterances. Each form of input has its own grammars.

Grammars can be defined according to the XML-based W3C Speech Recognition Grammar Specification (SRGS) language [11]. SRGS supports both grammars for speech recognition and grammars for DTMF user input validation. In addition to the above, the media server has a set of general purpose grammars built into the VoiceXML interpreter. These built-in grammars allow ease of application development, in that these grammars do not have to be defined using the SRGS.

As with VoiceXML dialogs and subdialogs, a set of commonly used grammar rules can be maintained as a library. A grammar definition can be embedded within the application or it can be referenced from an externally located file.

The concept of scope also applies to grammars. Multiple grammars may be active at the same time. For instance, when a grammar is defined with scope that applies to the entire VoiceXML document, then the grammar is active during all input collection phases. This mechanism is useful when defining common or global user input action items, such as “Press *9 at any time to receive help.”

DTMF GrammarsDTMF grammars define rules for collecting and validating user input supplied as DTMF key presses. DTMF grammars can be specified in either of the following ways:

• As a built-in grammar (see page 55)

• As an XML-based SRGS grammar (see page 53)

Speech GrammarsSpeech grammars define rules for collecting and validating user input supplied as speech utterances. The processing of speech grammars is performed by the external speech server, not by the media server. Speech grammars can be specified in either of the following ways:

• As a built-in grammar (see page 55)

• As an XML-based SRGS grammar (see page 53)

The actual support for speech grammars depends on the external speech server deployed. Provided the input mode (however defined) is voice, all grammars are passed directly to the speech server for evaluation. The determination of support for these grammars is then made by the external speech server.


3Input Mode

Input Mode

The media server supports three modes of input:

• DTMF

• Voice

• DTMF and voice

The VoiceXML Specification [13] defines different input defaults for different grammar types. These are shown in Table 3-1.

Table 3-2 shows how the input mode is determined on the media server.

Table 3-3 shows how the input mode interacts with the mode of the grammar, as defined by the mode attribute of the <grammar> element.

Table 3-1 Default Input Modes for VoiceXML Grammars

Grammar Type Default Input Mode

XML-SRGS Voice

Built-In DTMF and Voice

Menu-Choice No explicit default. The way in which the grammar is defined determines the input mode for the grammar.

Option DTMF and Voice

Table 3-2 Mechanisms for Setting Input Mode Scope

Mechanism Description Scope/Precedence

Configured input mode The default input mode as configured using the media server’s management interface. The default is DTMF.

Scope: SessionPrecedence: Lowest

inputmodes attribute An attribute of the <property> element that defines the input mode. If this attribute is not set, the value configured through the management interface is used. The starting (default) value is that set through the management interface.

Scope: Depends on scope of the <property> element. May be as high as Application or as low as Dialog.Precedence: Higher than configured input mode.

Table 3-3 Interaction of Input Mode and Grammar Mode

Input Mode Grammar Mode Behavior

DTMF DTMF The media server detects, collects, and parses DTMF input.

DTMF Voice No grammars are active. The media server behaves as if no grammars were present in the script. Digits cannot barge clips and are not buffered. NOINPUT is reported for all collections.



Menu-Choice Grammars

Menu-choice grammars are a simple mechanism for allowing the user to make a choice, and transitioning application control to another location is based on the user’s choice. Using audio prompts, the menu offers the user a set of choices, after which it waits for user input. The dialog transitions based on the user input.

A menu-choice grammar can concurrently define both a DTMF and a speech grammar.

Menu-choice grammars are implemented using the <menu> element and the <choice> element; please see those elements for details.

Option Grammars

Option grammars are a relatively simple way to specify grammars for collecting and processing user input. Simple DTMF or speech sequences or speech sequences are specified within the <option> element. The value attribute is assigned to the result of the collection, based on the option that was matched.

An option grammar can concurrently define both a DTMF and a speech grammar.

Option grammars are implemented using the <option> elementt; please see that element for details.

Voice DTMF No grammars are active. The media server behaves as if no grammars were present in the script. Digits cannot barge clips and are not buffered. NOINPUT is reported for all collections.

Voice Voice The voice grammar is passed to the external speech server. The external speech server detects, collects, and parses coice collection and passes the results to the media server as an NLMSL script. DTMF digits are ignored.

DTMF and Voice DTMF No voice grammar is activated. Only DTMF collection is valid.

DTMF and Voice Voice Only voice grammar is activated. DTMF digits are ignored.

DTMF and Voice DTMF and Voice (In some cases two grammars would be required.)Both DTMF and coice grammars are active. DTMF input cancels the voice grammar and any voice input received until that point.

Table 3-3 Interaction of Input Mode and Grammar Mode

Input Mode Grammar Mode Behavior


3SRGS Grammars

SRGS Grammars

SRGS grammars are grammars defined according to the XML-based W3C Speech Recognition Grammar Specification (SRGS) Language [11]. The SRGS standard support both grammars for DTMF user input and for speech recognition.

SRGS grammars consist of SRGS elements. The section “SRGS Elements” on page 9 shows the SRGS elements supported by the media server.

The scope of a grammar rule can be either private or public. If the rule’s scope is private, then the rule can be referenced only from other rules in the local grammar. If the rule’s scope is public, and if the rule is activated for recognition, then the rule can also be from other grammars.

XML-SRGS grammars can be defined either inline (that is, internal to the VoiceXML document) or external.

Inline SRGS GrammarsInline grammars are defined within the VoiceXML document. Inline grammars are defined using the supported set of XML-based SRGS elements described in Table 1-3 on page 10. Inline grammars follow the grammar scoping rules as defined by the W3C VoiceXML 2.0 Specification [13].

Example 3-1 and Example 3-2 each show an inline SRGS grammar. In Example 3-1, the grammar produces a match if the user enters exactly one of 0 , 1, 2, 3, 4, *9, or #9. Any other form of user input generates a nomatch event.

Example 3-1 Inline SRGS DTMF Grammar

<grammar mode="dtmf" ><one-of>

<item><one-of>

<item> 0 </item><item> 1 </item><item> 2 </item><item> 3 </item><item> 4 </item>

</one-of></item><item>

<one-of><item> * 9 </item><item> # 9 </item>

</one-of></item>

</one-of></grammar>



In Example 3-2, the grammar produces a match if the user enters utters exactly one of “zero,” “one,” “two,” “three,” “four,” “star nine,” or “pound nine.” Any other form of user input generates a nomatch event.

Example 3-2 Inline SRGS Voice Grammar

<grammar mode="voice" > <one-of> <item> <one-of> <item> zero </item> <item> one </item> <item> two </item> <item> three </item> <item> four </item> </one-of> </item> <item> <one-of> <item> star nine </item> <item> pound nine </item> </one-of> </item> </one-of></grammar>

External SRGS GrammarsExternal SRGS DTMF grammars are specified in exactly the same way as inline SRGS DTMF grammars, but in a separate VoiceXML document. The source of the external DTMF grammar is specified as a URI, and the document may be fetched using HTTP when the grammar is required by the VoiceXML interpreter depending on the media server’s VoiceXML input mode and the mode of the grammar:

• If the mode attribute is set to dtmf, the grammar is fetched and parsed by the media server using HTTP.

• If the mode attribute is set to voice, the external URL is passed unchanged to the external speech server, which assumes responsibility for the grammar.

• If the mode attribute is unspecified, then the behavior of the media server depends on the input mode configured through the media server’s management interface:

• If the configured input mode is dtmf, the grammar is fetched and parsed by the he grammar is fetched and parsed by the media server. unless it determines that the mode is voice. As soon as the media server determines that the mode is voice, it stops parsing and sends the grammar to the speech server for processing.

• If the configured input mode is voice, the external URL is passed unchanged to the external speech server, which assumes responsibility for the grammar.


3Arbitrary Grammars

• If the configured input mode is dtmf and voice, the document is fetched by the media server and the decision of how to parse is made by determining whether or not a <grammar> element is present. If it is, the input mode is assumed to be voice; otherwise, the input mode is assumed to be DTMF.

Arbitrary Grammars

The media server has internal support for menu-choice, option, XML-SRGS and built-in grammars. Some speech servers also support arbitrary grammars, such as ABNF grammars, if specified within the the <grammar> element. Provided that the input mode is voice, all grammars are passed directly to the speech server for evaluation. The determination of support for the arbitrary grammar is then made by the speech server.

Built-In Grammars

In addition to SRGS grammars, there is a set of grammars built into the media server’s VoiceXML interpreter. These are designed to facilitate development by eliminating the need to use SRGS for simple, general-purpose grammars. No XML definition is required to use these grammars. For speech grammars, the built-in grammars are converted to XML-SRGS grammars before being passed to the speech server. Built-in grammars are specified by using either the type attribute of the <field> element, or the src attribute of the <grammar> element.

Built-in grammars are implicitly active for both DTMF and speech user input; however, some built-in grammar types (for example, Date or Currency grammars) are designed specifically for DTMF. For these built-in grammar types, collection and interpretation by the speech server may yield unpredictable results.If a built-in grammar type does not explicitly specify valid input values for voice, you should assume that the built-in grammar is valid for DTMF only.

For any built-in DTMF grammar, the media sever can accumulate at most 30 DTMF digits. If not otherwise constrained, all grammars terminate upon receipt of the 30th digit. The received digits are then evaluated based on the specific grammar associated with the collection.

Limitations on the length of speech depend on the external speech servers deployed. All <field> element–defined grammars are converted into their <grammar> element equivalent before passing it to a speech server.

The following built-in grammar types are defined:

• Boolean

• Date

• Digits

• Currency

• Number

• Phone



• Time

All speech grammars defined within a <field> element are converted to the equivalent <grammar> element grammar (that is, all built-in grammars are converted to an SRGS grammar) before the grammar is passed to the speech server. Table 3-4 shows examples of this conversion.

BooleanBoolean grammars accept a string of one or more DTMF digits, and assign a string value of true or false based on the digits entered.

By default, the key 1 corresponds to true and 2 corresponds to false for DTMF grammars. For voice grammars, “Yes” corresponds to true and “No” corresponds to false. DTMF bindings may be changed by appending an HTTP URI–style keyword=value query syntax to the grammar type. The keywords y and n are accepted as alternatives to true and false, respectively. The use of variable length digits is supported.

Example 3-3 Boolean Built-In Grammar

boolean?y=4;n=5, boolean?n=31;y=32

Table 3-4 Conversion of Built-In Speech Grammars to XML-SRGS Grammars

Gram-mar Type Built-In Representation Grammar String Sent to Speech Server

Boolean <field type="boolean"><grammar src=" builtin:dtmf/boolean"/>

<grammar mode="voice" src="builtin:grammar/boolean"/>

Currency <field type="currency"><grammar src=" builtin:dtmf/currency"/>

<grammar mode="voice" src="builtin:grammar/currency"/>

Date <field type="date"><grammar src=" builtin:dtmf/date"/>

<grammar mode="voice" src=" builtin:grammar/date"/>

Digits <field type="digits?length=1"> <grammar src="builtin:dtmf/digits?length=1"/>

<grammar mode="voice" src="builtin:grammar/digits?length=1"/>

Number <field type="number"><grammar src=" builtin:dtmf/number"/>

<grammar mode="voice" src="builtin:grammar/number"/>

Phone <field type="phone"><grammar src=" builtin:dtmf/phone"/>

<grammar mode="voice" src="builtin:grammar/phone"/>

Time <field type="time"><grammar src=" builtin:dtmf/time"/>

<grammar mode="voice" src="builtin:grammar/time"/>


3Built-In Grammars

CurrencyCurrency grammars accept entry of a variable number of DTMF or voice digits and the asterisk (“*”) key. Entries are assigned to a string in the format mm.nn format, where mm corresponds to zero or more digits in the major currency unit, and nn corresponds to zero or more digits the minor currency unit. The asterisk key is used as the decimal point to separate the major and minor currencies.

With the exception of leading zeros, which are removed from the string, all entered digits are included in the resulting string.

Example 3-4 Currency Built-In DTMF Grammar

builtin:dtmf/currency

DateDate grammars accept 2-, 4-, 6-, and 8-character DTMF or voice collections. These represent, respectively, days (dd), month and day (mmdd), year and month (yyyymm), and year, month, and day (yyyymmdd).

The digits entered must form a valid date. The year component can be any four digits; that is, no validation is performed. The month must be between 01 and 12. Day must be between 01 and 31. Days are not checked for validity against the specified month. An error.nomatch event is thrown for invalidly entered dates.

Note that, different from the specification [13], no question mark characters (“?”) are used to pad the input. Only the digits received are returned.

Example 3-5 Date Built-In DTMF Grammar

builtin:dtmf/date



DigitsDigit grammars accept entry of a variable number of DTMF or speech digits. The number of digits accepted may be constrained by appending an HTTP URI–style keyword=value query syntax to the grammar type.

Keywords accepted are minlength, maxlength, and length, where length specifies the exact number of digits accepted. An error.badfetch is thrown if there is a conflict between the keyword values. All digits are included in the resulting string.

Example 3-6 Digits Built-In Grammar

digits?minlength=2;maxlength=8

NumberNumber grammars are identical to currency grammars, except that:

• The asterisk (“*”) key is interpreted as a decimal point, rather than a currency separator, and

• None of minlength, maxlength and length are specified.

Leading zeros are removed from the resulting string. This allows the result to be used in an ECMAScript expression, as ECMA would interpret a leading 0 as representing an octal value.

Example 3-7 Number Built-In Grammar

builtin:dtmf/number

PhonePhone grammars behave identically to number grammars, except that:

• All digits entered are included in the resulting string

• The asterisk key (“*”) is interpreted as representing an extension. For example, 8005551212*123 results in a returned string of 8005551212x123.

Example 3-8 Phone Built-In Grammar

builtin:dtmf/phone


3Maximum Length of Grammars

TimeTime grammars accept entry of three or four DTMF digits representing a time, and return a five-character string in the format hhmmx, where hh is the hours between 00 and 24, mm is minutes between 00 and 59, and x is either h (for a 24-hour clock) or ? if the entry is ambiguous between a 12- and 24-hour clock. Because morning (AM) cannot be unambiguously expressed in DTMF, ? will be a common termination.

If only three digits are entered, the media server adds a leading zero to the string.

Example 3-9 Time Built-In Grammar

builtin:dtmf/time

Maximum Length of Grammars

All grammars, whether inline or external SRGS grammars or built-in, have a maximum number of digits that can be collected. Regardless of how the grammar is defined, the grammar has at most one maximum length, which is the maximum number of digits that can be collected before the grammar has been satisfied. For simple built-in grammars (for example, digits?length=4) this is explicitly stated. For other grammars, the maximum length is implicitly determined by evaluating the grammar.

Example 3-10 shows an SRGS grammar.

Example 3-10 Variable Maximum Digit Length in a DTMF Grammar

<grammar version="1.0" type="application/srgs+xml" mode="dtmf"root="root"><rule id="root" scope="public">

<one-of><item> 1 <item> 3 </item> </item><item repeat="0"> 3 </item><item repeat="3"> 4 </item><item repeat="3-5"> 5 </item><item repeat="4-"> 6 </item><item repeat="0-1"> 8 </item><item> 9 </item>

</one-of></rule>

</grammar>

In this example, the possible maximum lengths are 1, 2, 3, and 30 digits. Since only one maximum length can be associated with a grammar at any time, the longest maximum length (in this case 30) is used.



The means that, for a given grammar, digit collection may not end immediately at the first input that satisfies one possible maximum length. In Example 3-10, the fourth item has a length of 3; as as such, an input string of 444 might be expected to end collection immediately. Instead, the maximum length is the longest possible maximum length—30. In this case, the inter-digit timer is started and the system waits to see if additional input will be forthcoming. If no additional input is received within the inter-digit timeout interval, collection will end and the input string 444 is accepted as satisfying the fourth grammar item. Thus, for a grammar with variable length items, collection will only be terminated by either the longest possible maximum length or an inter-digit timeout.

Grammar Evaluation

All DTMF collections are evaluated in real time as digits are received. All matches and no-matches (for example, if the current digit results in a match or an impossible match) are recognized and reported as soon as the current digit is evaluated.

For voice grammars the evaluation of incoming speech against the currently defined grammar is performed by the external speech server. The specific behavior will depend on the speech server employed.


4

This chapter describes the VoiceXML 2.0 elements currently supported by the Convedia Media Server, including SRGS and SSML elements.

The VoiceXML 2.0 language is defined by the W3C Candidate Recommendation specifying the language [13]. Any features of VoiceXML specified in the Recommendation but not in this guide are not supported in this release of the Convedia Media Server. Any features of VoiceXML specified in this guide but not in the Recommendation are extensions to the specification.

Chapter 4: VOICEXML 2.0 ELEMENTS


VoiceXML 2.0 Elements4

<assign>Assigns a value to a variable.

Attributes

Usage GuidelinesUse this element to assign a value to a variable.

Note that the maximum size of a variable name—whether assigned, or newly created using the <var> element—is 256 characters. If the variable name exceeds this length, an error.semantic is thrown.

Parent element: <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, <nomatch>

Child elements: None.

name Mandatory. The name of the variable being updated.

expr Mandatory. An ECMAScript expression representing the new value of the variable.


4<audio>

<audio>Plays an audio clip or multimedia file or renders a text-to-speech clip.

Note that the SSML elements <break>, <desc>, <emphasis>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, and <voice> do not appear a children or parents of the <audio> element within the XML schema.

Attributes

Parent element: <audio>, <block>, <catch>, <desc>, <emphasis>, <error>, <field>, <filled>, <help>, <if>, <initial>, <mark>, <menu>, <noinput>, <nomatch>, <p>, <phoneme>, <prompt>, <prosody>, <record>, <s>, <say-as>, <sub>, <subdialog>, <voice>

Child elements: <audio>, <break>, <emphasis>, <mark>, <p>, <prosody>, <s>, <say-as>, <value>, <voice>

src A URI or numeric index representing the media clip or TTS string to be played. A URI must comply with the XML anyURI format. In addition, the URI or numeric index must comply with the constraints described in the section “Working with Media Files and TTS Strings” on page 28.

Exactly one of src and expr must be specified; otherwise, an error.badfetch is thrown.

expr An ECMAScript expression evaluating to the URI or numeric index of the media clip or TTS string to be played. A URI resulting from the expression must comply with the XML anyURI format.In addition, the URI or numeric index resulting from the expression must comply with the constraints described in the section “Working with Media Files and TTS Strings” on page 28.

Exactly one of src and expr must be specified; otherwise, an error.badfetch is thrown.

fetchhint Optional. Ignored.

fetchtimeout Optional. Ignored.

maxage Optional. Ignored.

maxstale Optional. Ignored.



Usage GuidelinesThe <audio> element requests the media server to play an audio clip or multimedia clip, or to render text-to-speech strings.

Audio, Video, and Multimedia ClipsClips are played to completion, unless user input interrupts (“barges”) the clip. Barging can be allowed or disallowed by setting the bargein property. For a description of this property, please see the section “Prompt Properties” on page 43. The type of input that can interrupt a multimedia clip is specified using the inputmodes attribute of the <property> element.

The audio to be played is identified by a URI specified in either the src or expr attribute. The referenced audio file can be stored internally (a “provisioned” clip) or externally on an NFS or HTTP server. Internal clips are specified using the file: scheme. External clips are specified using either the file: scheme (for NFS) or the http: URI scheme (for HTTP) and are fetched using the HTTP protocol or NFS.

The use of a base URI is supported for both the src and expr attributes. The base URI can be defined using attributes in the <vxml> or <prompt> elements, or may be defined relative to the actual document. The base URI is applied if:

• The URI specified in the src attribute does not start with either the file: or http: schemes OR

• The URI resulting from evaluation of the expression specified in the expr attribute does not start with either the file: or http: schemes AND it is not a numeric value greater than 50,000. (Numeric values greater than 50,000 represent indexes of internally recorded clips and, as such, do not have a base URI prepended to them.)

For complete information on referring to media clips, please see the section “Working with Media Files and TTS Strings” on page 28.

The element may have optional audio, such as alternate audio, silence, or text, with the restrictions described in the section “Audio Clip Errors” on page 65.

Audio, video, or multimedia clips can be played individually or specified as a sequence of multiple clips. All clips must be of the same media type (for example, all audio or all multimedia). If media types are mixed, the media server plays up to the first clip containing different media and then fails. Within a single media type, file extensions can vary. For example, you can play a series consisting of WAVE files mixed with audio-only QuickTime files.

Each clip in a sequence is considered individually: if a clip cannot be played, it is skipped and the next clip in the sequence is played, if possible. Also, the failure to play a clip does not cause a session to fail.

Text to Speech StringsText to Speech (TTS) strings can be specified within the body of an <audio> or <prompt> element. If the TTS string is in the body of an <audio> element, it is assume to be an alternate audio string. A TTS string may be any of the following:

• Plain text

• Encoded Synthetic Speech Markup Language (SSML)


4<audio>

• Text with embedded SSML elements

• A URI to an external SSML file, expressed using either the src or expr attribute. For external URIs, the file extensions SSML, CSSML, and TXT are supported. Any other file extension is assumed to be a media clip.

An <audio> element defined (that is, embedded) within a TTS string is only valid if there are active external TTS servers. For systems that do not deploy TTS servers, then the entire string, including any embedded <audio> elements, is ignored.

Audio Clip ErrorsAn audio prompt may consist of a single clip or a sequence of clips. As long as the specification of the audio clip is syntactically valid, the media server treats the clip as if it has been successfully played, regardless of any errors that might occur subsequently. For example, the request to play a non-existent clip is considered successful so long as the clip specification is syntactically valid. From the media server’s perspective, the clip completes and the session transitions to the next element in the document.

In sequences, this means that audio clips that fail to play are skipped, and failure to play a clip does not affect other clips in the sequence. For example, if the second clip of a four-clip sequence fails to play, the third and fourth clips will be played (if possible). Note that failure to play a clip does not result in failure of the session.

The only exception to this behavior is if the request to play an audio clip fails because of an overload condition, such as service overload. In this case, the session terminates.

Alternate AudioThe media server supports the use of alternate audio or silence. Alternate audio allows an application the means to specify an audio clip, multimedia clip or TTS string to be played in case the primary clip fails. The primary clip can be either an audio or multimedia clip; it cannot be a TTS clip embedded within the VoiceXML document for the purpose of playing alternate audio. However, a TTS clip can be specified as an external URI. In that case, if the external speech server fails the playing of the clip (for example, because the file was not found or a parse error occurred), and alternate audio is defined, the alternate audio is queued and requested to be played.

Alternate audio is specified by including a second <audio> element nested within the first. Silence is played by including a <break> element nested within the <audio> element.

Alternate audio is played, when specified, if:

• The requested primary clip(s) are not found. However, if the clip was started but failed prematurely then the alternate audio will not be played. If a series of clips are specified as the primary clips and at least one of them plays then the alternate audio are not played.

• The primary clip is specified as an ECMA expression using the expr attribute and the ECMA variable does not exist. In this case, an ECMA error is thrown after evaluating the expr attribute. If this occurs, and an alternate <audio> or <break> element has been defined, then the <audio> or <break> element will be queued and played (assuming that it is validly specified). Otherwise the error is treated as non-fatal and the session transitions to the next element defined in the script.

The <audio> element src and expr attributes support only prerecorded audio clips and not TTS strings. Alternate audio however does not have this restriction. Alternate audio can be:



• A <break> element, which will play the specified silence

• An internal pre-recorded audio clip

• An external pre-recorded audio clip

• A TTS string.

The <audio> element supports only two levels of nesting. Thus, there is at most one level of alternate audio clip(s) that can be specified. Anything below the second level is ignored.

Example 4-1 Alternate Audio Examples

<audio src=file://ClipstoPlay> <audio src="file://AlternateAudioClip/></audio>

<audio src=file://Welcome> Welcome to your life</audio>

<prompt> This is a TTS string. <audio src=file://nextClip/> This is another string to play</prompt>

With respect to alternate audio, whether <audio> or <break> elements are used, only one alternate element can be defined. All others are ignored.

Note that the <break> element will not appear as a child of <audio> in the XML schema. The <break> element is not defined as a standard element in the schema and as such does not appear in the normal child-parent relationships.

Audio Clip Name LengthAll audio clip types, that is internal or external, have a maximum length of 256 characters. Any clip that exceeds 256 characters in length will be ignored. Indexed clips identifiers are limited to 50,000.

EncodingThe media server ignores the length specified in WAV file headers.

The media server first uses the HTTP Content-Type header to determine the codec. The Content-Type header is analyzed in this order:


4<audio>

1 If the content type is audio/basic, audio/x-alaw-basic, or audio-x-g729-basic, the media server assumes that the file is a raw file and rejects the request.

2 If the value is either audio/wav or audio/x-wav, the media server interprets the audio as a WAV file, and relies on the WAV header to determine the codec type. If a WAV header is not found, the media server fails the announcement.

3 If the value is audio/vnd.wave; codec=xxx, where xxx is any number, the media server interprets the file as a WAV file but uses the codec=xxx encoding in preference to any defined within the file.

4 If the media server cannot determine the file type from the header (for example, only “audio” is specified), the media server examines the file extension.

• If the extension is “.wav,” the media server uses the encoding specified in the file.

• If the extension is not “.wav,”, the media server assumes the file is a “.wav” file interprets the contents of file as containing a WAV header and accepts or rejects the request accordingly.

Interoperability NotesFor some speech servers:

No audio output is heard if PCMA is the configured codec. If the configured codec is PCMU the speech is heard. For PCMA nothing is heard and there is no error indicating that there was an issue.

The speech server fails to play a mixture of SSML, plain text, and CSSML text. All scripts are requested as external URIs. All three scripts can be heard being played separately but not as a group.

The generated TTS speech is choppy and garbled.

On some speech servers, a clicking sound occurs between the playing of TTS clips and local audio clips. This does not occur with other external servers.

The speech server generates speech in English when Mandarin is specified in some scripts.

A date speech grammar generates a mixture of Chinese and English when the xml:lang attribute is set to en-US.



<block>Allows execution of code within a form.

Attributes

Usage GuidelinesThe <block> element is a form item. It contains executable content that is executed if:

• The block’s form item variable has a value of undefined AND

• The block’s cond attribute (if any) evaluates to true. If cond is not specified, the behavior is as if cond is set to true.

Parent element: <form>

Child elements: <assign>, <audio>, <clear>, <disconnect>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>

name Optional. The name of the form item variable used to track whether this block is eligible to be executed.

The default is an inaccessible internal variable.

expr Optional. An ECMAScript expression representing the initial value of the form item variable. If initialized to a value, the form item will not be visited unless the form item variable is cleared.

The default is the ECMAScript value undefined.

cond Optional. A Boolean ECMAScript expression. The form item is visited if and only if this expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.


4<break>

<break>Inserts a pause or silence into audio.

Attributes

Parent element: <audio>, <prompt>

Accepted but ignored as a child of <choice>.


time Optional. The length of the interval of silence to be inserted, in seconds or milliseconds.

The format is <number><unit>, where <number> can be zero or more digits optionally followed by a period (“.”) and then by one or more digits. <number> may not be empty, and may optionally be preceded by a plus sign (“+”). <unit> may be one of ms (for milliseconds) or s (for seconds). Note that the right-hand side of the decimal point is calculated only if the units are in seconds; for milliseconds, the right-hand side of the decimal point is ignored. Spaces between the numeric value and the unit are not permitted.

For time values, the media server supports a range from 0 milliseconds to 2^31–1 milliseconds (a little less than 25 days), with a precision of 10 milliseconds. All values that exceed this range will be reset to 2^31–1. Examples of time are: 100ms, 50s, 20.5s, and +600ms.

The time attribute takes precedence over both size and strength. If nothing is specified, the default interval is 200 milliseconds.



Usage GuidelinesThe <break> element attribute allows silence intervals to be played within a VoiceXML script.

The element is essentially treated like an <audio> element, where the “clip” played is silence. Instead of specifying an actual audio clip, the <break> element specifies the interval of silence.

Up to one <break> element is supported within an <audio> element; others are ignored.

Note that the <break> element will not appear as a child of <audio> in the XML schema. The <break> element is not defined as a standard element in the schema and as such does not appear in the normal child-parent relationships.

strength Optional. The length of the interval of silence to be inserted, in predefined intervals. Supported values are as follows:

x-weak: 50 milliseconds

weak: 100 milliseconds

medium: 200 milliseconds

strong: 500 milliseconds

x-strong: 2000 milliseconds

none: 0 milliseconds

The time attribute takes precedence over both size and strength. The size attribute takes precedence over strength.If nothing is specified, the default interval is 200 milliseconds.

The strength attribute is always present in all <break> elements, whether specified or not. However, because of its low precedence, it is only used if it is specified and neither time nor size is specified.

size Deprecated in favor of strength, which is compliant with [4]; however, this attribute is still accepted for backwards compatibility.


4<catch>

<catch>Handles (catches) events.

Attributes

Parent element: <field>, <form>, <menu>, <record>, <subdialog>, <vxml>


event Optional. The event or events to be caught by this event handler.

The format is a space-separated list of event names, where an event name is one of the supported events listed in the section “Events” on page 24.

If more than one event is specified, a separate event counter (that is, a separate count attribute) is maintained for each event.

count Optional. The number of occurrences of the event.

The count attribute allows an application to handle different occurrences of the same event in different ways. Each <form>, <menu>, and form <item> maintains a counter for each event that occurs while it is being visited. These counters are reset each time the <menu> or form item’s <form> is re-entered. The form-level counters are used in the selection of an event handler for events thrown in a form-level <filled>.

Counters are incremented against the full event name and every prefix-matching event name; for example, the occurrence of the event “event.foo.1” increments the counters associated with handlers for “event.foo.1,” “event.foo,” and “event.”

The count may not exceed a 32-bit unsigned integer. The default is 1.

cond Optional. A Boolean ECMAScript expression. The catch handling routine is invoked if and only if this expression evaluates to true. The default is true.



Usage GuidelinesThe <catch> element allows executable content to be defined for a number of events that the interpreter can generate.

The cond attribute is used to test for event conditions. The special variable _event is supported to store the name of the event that is thrown.

The special variable _message is also supported. This variable holds an optional message string, which may be set within the <throw> element. If a message has not been specified, then the variable will be set to the ECMAScript value undefined.


4<choice>

<choice>Provides menu choices.

Attributes

Parent element: <menu>

Child elements: <emphasis>, <grammar>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>.

<break> is accepted but ignored.

dtmf Optional. Specifies a simple DTMF sequence which, when matched, will result in this choice. White space is permitted in the DTMF sequence specification; for example “1234#” and “1 2 3 4 #” are treated as equivalent. There is no default.

Generic DTMF recognition properties (that is, interdigittimeout, termtimeout, and termchar) apply. For more information about DTMF properties, please see the section “Generic DTMF Recognizer Properties” on page 41.

accept Optional in speech grammars; ignored for DTMF grammars. Tue only valid is exact.

An accept value specified in a <menu> element, overrides the value set here.

next Fetches the document at the specified URI. The URI must comply with the XML anyURI format.

Exactly one of next, expr, event, and eventexpr must be specified. Otherwise, an error.badfetch is thrown.

expr Fetches the document at the URI resulting from evaluation of the specified ECMAScript expression. The URI must comply with the XML anyURI format.


event Throws the specified event when this choice is made. For a list of supported events, please see the section “Events” on page 24.




eventexpr Throws the event resulting from evaluation of the specified ECMAScript expression when this choice is made. For a list of supported events, please see the section “Events” on page 24.


message Optional. Returns the specified message string to the event handler, along with the event name. There is no default.

Only one of message and messageexpr may be specified. Otherwise, an error.badfetch is thrown.

messageexpr Optional. Returns the message string resulting from evaluation of the specified ECMAScript to the event handler, along with the event name. There is no default.


fetchaudio Ignored.

fetchhint Ignored.

fetchtimeout Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out.



The applicable property for this attribute is the fetchtimeout property. If the attribute is not set, the value set for the property will be applied. If the fetchtimeout property is not explicitly set (using the <property> element) the property default is applied. For the default value of supported properties, please see Chapter 2: VoiceXML Properties.

maxage Ignored.

maxstale Ignored.


4<choice>

Usage GuidelinesThe <choice> element defines a menu item and allows the application to define a simple DTMF sequence or voice specification to indicate this menu choice. It also allows specification of a destination URI for fetching the next document when the menu choice has been made. Optionally, the element can be set to throw an event when the choice is made.

All <choice> elements defined for voice are converted into an XML-SRGS format, which is then passed to an external speech server for processing.

Note although <break> is a valid child of <choice> in the VoiceXML schema, it is ignored (though accepted) in this implementation, and no action is taken if specified.


Saying a subphrase of the <choice> element in a menu grammar results in a match being returned, even in some cases that should be a nomatch.



<clear>Clears or resets form items (form fields).

Attributes

Usage GuidelinesThe <clear> element resets the specified variable(s), including form item variables.

When form items are cleared, the prompt and event counters are reinitialized and the form item variable is set to the ECMAScript value undefined.



namelist Optional. Resets the specified variable(s), including any form item variables. The format is a space-separated list of variable names.

By default, all form items for the current form are reset.


4<controlcmd>

<controlcmd>Specifies the actions associated with DTMF key presses for prompt controls.

Attributes

Parent element: <promptcontrol>

Ignored if specified as a child of <block>, <catch>, <error>, <filled>, <help>, <if>, <noinput>, or <nomatch>.


dtmf Mandatory. Specifies a single DTMF key with the associated audio control action. Supported digits are 0–9, *, #, A, B, C, and D. (Note that “a” through “d” are not supported.) Whitespace is not permitted. Any other value, or use of whitespace, will cause an error.badfetch to be thrown.

You may specify the same DTMF key for both the pause and resume actions, in order to achieve a “toggle” action. Also, you may specify the same DTMF key for a single action defined multiple times. Otherwise, you cannot specify the same DTMF key for different actions, and doing so will cause session termination with an error.semantic.

action Mandatory. The audio control action to be performed when the specified DTMF digit is pressed. Supported values are as follows:

pause: Pause the stream for an indefinite period of time.

resume: Resume the paused stream.

seek: Stream audio beginning at the location specified by the combination of the from and to attributes.

volume: Adjust the volume by the amount specified by combination of the from and to attributes. The default volume is 0dB.



from Optional with seek and volume actions; ignored otherwise. The starting value for the seek and volume actions. Supported values are as follows:

begin: When used with seek, measure the change of location specified by the to attribute relative to the beginning of the file. When used with volume, interpret the volume specified by the to attribute as an absolute volume.

current: When used with seek, measure the change in location specified by the to attribute relative to the current position. When used with volume, interpret the volume specified by the to attribute as a change relative to the current volume.

The default is current.


4<controlcmd>

Usage GuidelinesThe <controlcmd> element specifies DTMF keys, and associates an action for audio prompt controls.

This element is valid only for pre-record audio clips. TTS clips specified within a <controlcmd> element are ignored.

Audio controls are limited to single DTMF keys, which are specified by the dtmf attribute. DTMF grammars (inline or external, and built-in or SRGS) are currently not supported in specifying audio controls.

to Mandatory with seek and volume actions; ignored otherwise.

When used with seek, this attribute represents the offset interval in seconds or milliseconds from the starting point specified by the from attribute. The format is <number><unit>, where <number> is an integer, and may optionally be preceded by a plus sign (“+”) or a minus sign (“-”), and where a plus sign moves the location forward (“fast-forward”) and a minus sign moves the location backward (“rewind”). <unit> may be one of ms (for milliseconds) or s (for seconds). Spaces between the numeric value and the unit are not permitted. The range is –(2^31–1) milliseconds to +(2^31–1) milliseconds, with a precision of 10 milliseconds. If the specified value exceeds the range in either direction, then the media server automatically applies the offset limit (either positive or negative). Specifying a forward location past the end of the audio file results in audio stream completion. Specifying a “rewind” amount past the beginning of the file results in play starting at the beginning of the file. Examples of to values are: 100ms, –50s, and +600ms.

When used with volume, this attribute represents a volume change.

• As an absolute volume specification (from=begin), the range is –96dB to +96dB, where the plus sign (“+”) is optional. Exceeding the range will cause an error.semantic to be thrown.

• As a change in volume relative to the current volume (from=current), the range is –192dB to +192dB, where the plus sign (“+”) is optional. Exceeding the range will cause an error.semantic to be thrown. If you specify a change of volume that is within the valid range, but which results in an absolute volume lower than the negative limit of –96dB or greater than the positive limit of +96dB, then the media server automatically applies the volume limit (either positive or negative).

Note that all units are required. Omitting units will cause an error.semantic to be thrown.



While a prompt control is active, the DTMF keys and associated control actions override any currently active grammars or prompt barge-in. DTMF digits not consumed by <controlcmd> action keys are used by currently active grammars or prompt barge-in.

Actions specified in the <controlcmd> element are active during the play of a prompt only if the <prompt> element’s cvd:vcrprompt attribute is set to true.

The same DTMF key can be defined for pause and resume actions, so that the user can between pausing and resuming a clip. These are the only actions that may use the same DTMF key. Also, the same action can be defined multiple times using the same key. In this case, the most recent definition overrides the previous ones.

Any other combination of actions that uses the same key results in an error.semantic being thrown and session termination.

Control actions specified in <controlcmd> apply or span a single <prompt> element. If several <prompt> elements are played back to back, with control commands enabled, then each is treated independently.

All errors in specifying media controls result in the session terminating. An error.badfetch is thrown for any errors detected by the parser. This is generally cases where the value assigned to the attribute does not conform to the regular expression for that attribute—for example, a value for the dtmf attribute that is not a valid DTMF digit. For all other errors, an error.semantic thrown. Possible error cases include the following:

• Omitting the to attribute for volume or seek actions.

• Specifying a value for from that is neither begin or current.

• Specifying a time value for the to attribute when the action is volume.

• Specifying a volume-based value for the to attribute when the action is seek.

• Failing to include units (s, ms, or dB) for the to attribute.

• Including a space between the value and the unit for the to attribute—for example, 3 s.

• Specifying a value that is out of range for the to attribute for an absolute volume specification—that is specifying a value that is less than –96 dB or greater than +96 dB when from=begin.

• Using the same DTMF key is for two different actions which are not pause and resume. Pause and resume are the only actions that may use the same key for the toggle function. Otherwise the same DTMF key cannot be used for different actions (although the same action can be defined multiple times using the same digit).


4<desc>

<desc>[SSML] Provides a textual description of audio content.

Attributes

Usage GuidelinesThe <desc> element provides a textual description of audio source (for example, “door slamming”).

The <desc> element can only occur within the content of the <audio> element. If text-only output is being produced by the synthesis processor, the content of the <desc> element(s) should be rendered instead of other alternative content in audio. The optional xml:lang attribute can be used to indicate that the content of the element is in a different language from that of the content surrounding the element. Unlike all other uses of xml:lang in this document, the presence or absence of this attribute will have no effect on the output in the normal case of audio (rather than text) output.


The <desc> element is only supported as content of the <audio> element. The expected behavior of the VoiceXML script and the subsequent SSML TTS body is that the request be rejected; however, the speech is generated and played.

Parent element: <audio>


xml:lang Optional. Indicates that content of this element is in a different language from that surrounding the element.



<disconnect>Terminates the VoiceXML application, sending a SIP BYE.

AttributesThis element has no attributes.

Usage GuidelinesThe <disconnect> element allows the VoiceXML interpreter context to disconnect the user.

Execution of the disconnect element causes the connection.disconnect.hangup event to be thrown, which may optionally specify some clean-up actions. The current session is terminated, a SIP BYE is sent to the control agent, and all associated media port resources are released by the platform.

See also the related elements <exit> and <return>.




4<else>

<else>Provides alternative logic for an <if> condition.


Usage GuidelinesThe <else> element is an optional element. It defines the beginning of an else clause specifying the code to be executed if the conditions specified in the associated <if> element are not satisfied.

Parent element: <if>




<elseif>Provides alternative logic for an <if> condition.

Attributes

Usage GuidelinesThe <elseif> element is an optional element. It defines a new conditional clause specifying the code to be executed if the conditions specified in the associated <if> element are not satisfied. The new clause is entered only if the conditions specified by the cond attribute are satisfied.

Parent element: <if>


cond Mandatory. A Boolean ECMAScript expression. The associated clause is executed if and only if the expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.


4<emphasis>

<emphasis>[SSML] Directs the speech server to add emphasis to surrounded text.

Attributes

Usage GuidelinesThe <emphasis> element requests that the contained text be spoken with emphasis (also referred to as prominence or stress). The synthesis processor determines how to render emphasis since the nature of emphasis differs between languages, dialects or even voices.

The emphasis element can only contain text to be rendered.


The <emphasis> element with the level attribute has no effect.

.

Parent element: <speak>

Child elements: .<audio>, <break>, <emphasis>, <mark>, <phoneme>, <prosody>, <say-as>, <sub>, <voice>

level Optional. Indicates the strength of emphasis to be applied. Defined values are as follows:

strong

moderate

none

reduced

The default level is moderate.

The meaning of strong and moderate emphasis is interpreted according to the language being spoken (languages indicate emphasis using a possible combination of pitch change, timing changes, loudness and other acoustic differences). The reduced level is effectively the opposite of emphasizing a word. For example, when the phrase “going to” is reduced it may be spoken as “gonna”. The none level is used to prevent the synthesis processor from emphasizing words that it might typically emphasize. The values "none", "moderate", and "strong" are monotonically non-decreasing in strength.



<error>Handles (catches) all error events.

Attributes

Usage GuidelinesThe <error> element catches all events of type error.

If multiple error handlers are installed or inherited, the handler is selected according to the procedure described for event handling in [13].

This element is equivalent to <catch event=“error”>.

For a list of supported events, please see the section “Events” on page 24.



count Optional. The number of times an error event may be thrown within its scope (form or menu), after which error handling is invoked. The count may not exceed a 32-bit unsigned integer. The default is 1.

cond Optional. A Boolean ECMAScript expression. The error handling routine is executed if and only if the expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.


4<example>

<example>[SRGS] Provides an example phrase that matches the input specification.

AttributesNone.

Usage GuidelinesThis SRGS element can be used within a grammar rule definition to illustrate an example of user input complying with the specification. No associated action for this element is performed within the interpreter context or the grammar engine; it is ignored by these components.

Parent element: None.




<exit>Terminates the VoiceXML application, while keeping the port open.

Attributes

Usage GuidelinesThe <exit> element allows control to be returned back to the interpreter context.

Unlike session termination as a result of a <disconnect>, <exit> allows the media server to retain media port resources. Other resources (documents, variables, and so on) associated with the session are released; however, the media port resources are not released by the platform.

A SIP BYE is not sent to the control agent. The port resources are kept “on hold” pending further direction from the control agent.



expr Optional. An ECMAScript expression (such as “field1” or “Finished”) to be returned to the interpreter context. By default, no expression is returned.

Only one of expr and namelist may be specified; if both are specified, an error.badfetch is thrown. No error is generated if neither is specified.

namelist Optional. A space-separated list of variables to be returned to the interpreter context. By default, no variables are returned.

Only one of expr and namelist may be specified; if both are specified, an error.badfetch is thrown. No error is generated if neither is specified.


4<field>

<field>Collects user input.

Attributes


Child elements: <audio>, <catch>, <error>, <filled>, <grammar>, <help>, <link>, <noinput>, <nomatch>, <option>, <prompt>, <promptcontrol>, <property>

name Optional. Defines a variable with the specified name, which will hold the result of the user collection defined by the <field> element. The variable name must be unique among all form items defined within the form; otherwise, an error.badfetch is thrown.

The format is an XML restrictedVariableName token, which is composed of alphabetic characters, digits, colon, and hyphen. The name may not begin with underscore (“_”) or contain a period (“.”). In addition, the name must follow ECMAScript variable naming conventions and may not include ECMAScript reserved words.

There is no default.

expr Optional. An ECMAScript expression assigning the initial value of the form item variable defined by name. If the initial value is set using this attribute, the form item will not be executed until the variable is cleared (for example, by using the <clear> element). The default is the ECMAScript value undefined.

cond Optional. A Boolean ECMAScript expression. The field is executed if and only if the expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.

type Optional. Provides the definition of a built-in grammar.

Instead of using this attribute, a grammar can be specified using the <grammar> element.

slot Not supported. If received it will cause an error.unsupported event to be thrown.



Usage GuidelinesThe <field> element prompts the user to provide input based on the specified grammar. The grammar can be DTMF and/or voice.

The type attribute takes one of the defined built-in grammars as an argument. Built-in grammars implicitly support DTMF and voice inputs unless the input mode is explicitly specified using the inputmodes attribute of the <property> element.

As an alternative to specifying the grammar in the type attribute, the grammar for a <field> element can be specified using the <grammar> element.

All voice grammars defined using the type attribute are are converted into their <grammar> equivalent before being passed to the external speech server. shows the conversion that takes place between a type-specified grammar and its <grammar> equivalent, and shows whether or not that representation is supported for DTMF and voice.Table 4-1 Conversion of <field> “type” Attribute to <grammar>

For more information about DTMF and voice grammars, please see Chapter 3: DTMF and Voice Grammars.

modal Optional. Allows you to disable all other grammars while the field is being executed, so that only the grammar associated with field is active. Supported values are as follows:

true: Disable all other grammars, leaving only this one active.

false: Keep all grammars enabled.

The default is false.

<field> Representation <grammar> EquivalentSupported Mode

<field type=”boolean”> <grammar src="builtin:grammar/boolean"/> DTMF and voice

<field type="boolean?y=5;n=6>” <grammar src="builtin:dtmf/boolean?y=5;n=6”/> DTMF only

<field type="digits"> <grammar src="builtin:grammar/digits"/> DTMF and voice

<field type="digits?minlength=3; maxlength=5">

<grammar src="builtin:grammar/digits?minlength=3; maxlength=5"/>

DTMF and voice

<field type="date"/> <grammar src="builtin:grammar/date"/> DTMF and voice


4<field>


The match rate for voice inputs is very low.

There is an inconsistent match rate across match tests.

A match is returned when a no-match is expected in some test cases. This occurs with different grammar types.

A special SRGS rule (which is matched without the user speaking any word) does not work. The expected behavior is that the grammar can be used to match “zero” or silence. However, currently the rule is not matched.

A special SRGS rule (which matches any speech up until the next rule match, the next token, or the end of spoken input) does not work. Currently the rule is not matched as expected.

“0229” is recognized as “0529” for date grammars. Enter the values zero, two, two, nine and the speech server returns is returns “0529”.

Entering an invalid leap date returns a date. The expected behavior is to return an error or a nomatch.

Some digits are dropped or mismatched for ASR digit grammar. “13456” was entered but “12345” was returned.

Currency input of 100.798 drops the final digit and returns 100.79.

The speech server accepts only up to 2 decimal places for number grammar. Entered 98.765 and 98.76 was returned.

The speech server returns nomatch for a number grammar if the leading digits are zeros.

The point character (“.”) is recognized as “1” instead of dot for number grammars.

Phone grammars are incorrectly recognized. An input of “6044202978” returned “6004123457”.


Noinput was returned when a match was expected for the input: “zero six zero six zero six zero six zero six”.

Speech input in a date grammar is incorrectly interpreted. Entering “june, nineteen seventy eight” results in a returned string of “780619”.

For MRCP v1, saying or generating the speech “twelve o’clock” results in a no match being returned in a time grammar. For MRCP v2 this test succeeded.

A grammar completion failure occurs setting up an ABNF grammar.

Speech server running MRCP v1 does not return PCMA as the lead codec when only PCMA is offered. As a result, the external server actually uses the PCMU codec while the media server is streaming PCMU. When running MRCP v2 the speech server works as expected.



<filled>Defines the code to be executed when user input is complete.

Attributes

Usage GuidelinesThe <filled> element specifies actions to be executed when the associated <field> has been completed by the user.

Parent element: <field>, <form>, <record>, <subdialog>


mode Optional. Specifies when execution of this element should take place. Supported values are as follows:

any: Execute when any of the input items has been filled by the user.

all: Execute only when all of the input items have been filled by the user.

The default is all.

namelist Optional. A space-separated list of variable names representing the input items that must be filled in order for this element to be executed. When this element occurs within a form, this list defaults to the names (both implicit and explicit) of the form’s input items; otherwise, there is no default.


4<form>

<form>Defines a dialog for collecting user input.

Attributes

Usage GuidelinesThe <form> element is a key mechanism in VoiceXML for presenting information to the user and collecting user input.

A form consists of form items, which can be visited during the execution of the form. Form items can either be input items (which are visited as a result of user input) or control items (which are independent of user input).

A form allows variable declarations and an event handler to be associated with the form. Additionally, the child element <filled> allows you to specify procedural logic that can be executed when user input is completed and a particular field item (or field) is filled.

Parent element: <vxml>

Child elements: <block>, <catch>, <error>, <filled>, <grammar>, <link>, <noinput>, <nomatch>, <promptcontrol>, <property>, <record>, <script>, <subdialog>, <var>

id Optional. A unique identifier for the document.

The format is an XML name token without colons (“:”). The name token may be composed of alphabetic letters, digits, period (“.”), underscore (“_”), and hyphen (“-”). The name must begin with a letter or underscore.

This identifier is optional. If specified, it can be used to within the current document or within another document to pass control to the form—for example, this-form in <goto next=“#this-form”>.

scope Optional. The default scope of this form’s grammar. Supported values are as follows:

dialog: This grammar applies only to the current form.

document: This grammar is active over the entire document. If the document is the root document, then the grammar scope applies to all documents referenced from the root document.

The default is dialog.



<goto>Transfers control to another dialog, abandoning the current dialog.

Attributes



next The URI of the document to which to transition. The URI must comply with the XML anyURI format.

Exactly one of next, expr, nextitem, and expritem must be specified; otherwise, an error.badfetch is thrown.

expr An ECMAScript expression evaluating to the URI of the document to which to transition. The URI resulting from the expression must comply with the XML anyURI format.


nextitem The name of the next item to transition to within the form.


expritem An ECMAScript expression evaluating to the name of the next item to transition to within the form.


fetchaudio Ignored.

fetchhint Ignored.


4<goto>

Usage GuidelinesThe <goto> element provides the ability to transition control to another dialog, either within the current document, or within another document.





maxage Ignored.

maxstale Ignored.



<grammar>Defines user input rules for DTMF or voice.

AttributesThis element has the following VoiceXML attributes.

Parent element: <choice>, <field>, <form>, <link>, <record>

Child elements: <rule>

src The URI of the grammar, if the grammar is to be fetched externally. The URI must comply with the XML anyURI format.

This attribute can also be used to directly specify a built-in grammar, using the notation builtin:grammar/type?parameters (where grammar=dtmf).

Either way, this attribute is mandatory if an inline grammar is not specified, and forbidden if an inline grammar is specified; that is, exactly one of src or an inline grammar must be specified. If both or neither are specified, an error.badfetch is thrown.

scope Optional. The default scope of this grammar. Supported values are as follows:

dialog: This grammar applies only to the current form.


If not specified, the grammar scope is inherited from the parent element.

type Optional. Identifies the MIME type of the grammar. If specified, this value takes precedence over file types or the HTTP Content-type header.

If not specified and the grammar is fetched externally, then the file extension type or the media Content-type is used to determine the grammar type. If not specified and the grammar is inline, the type is assumed to be XML; that is, “application/SRGS+xml”.

weight Ignored.

fetchhint Ignored.


4<grammar>

This element inherits the following SRGS attributes for inline grammars.





maxage Ignored.

maxstale Ignored.

version Mandatory for an inline XML grammar; forbidden otherwise. Identifies the W3C specification version of the grammar. The only supported value is 1.0.

xml:lang Optional for voice grammars; ignored for DTMF grammars. The language to be used for the entire grammar. The interpretation of the value associated with xml:lang is managed and verified by the speech server.

A value for xml:lang specified at the <item> level overrides a value specified here.



Usage GuidelinesThe <grammar> element specifies the rules for a valid set of user inputs or utterances.

The grammar definition can be inline, external, or built-in, and can be specified for both DTMF and/or voice. The grammar specification must be in the XML form of the notation specified by [11].

Exactly one of src or an inline grammar must be specified. If both or neither are specified, an error.badfetch is thrown.

External grammars that are voice grammars are fetched, parsed and processed by the external speech server. For this case the URI will be passed directly (as-is) to the speech server. For this reason, the media server must determine the grammar type (that is, the input mode) before it can pass the URI. The input mode can be defined in any of the following ways:

• By specifying the default input mode as a VoiceXML parameter using the media server’s management interface

• Using the inputmodes attribute of the <property> element

• Using the mode attribute of the <grammar> element

To be valid, a grammar must evaluate to at least one digit sequence. Grammars that evaluate to be empty (that is, no valid collection sequence is specified), are rejected with an error.grammar event.

mode Optional. The type of the current grammar. Supported values are as follows:

dtmf: The grammar is a DTMF-based grammar.

voice: The grammar is a voice-based grammar.

This attribute differs from the inputmodes property which represents the type of input that will be accepted. For a valid grammar (that is, a grammar that will be activated and can receive input), this attribute must align with the value of the inputmodes property. Grammars that mismatch between the mode attribute and the inputmodes property are ignored.

The default for this attribute in the specification is voice. For backwards compatibility, the default for the media server is dtmf.

root Optional for an inline grammar; forbidden otherwise. Identifies the grammar’s root rule. If not specified, the grammar’s default rule is used.

tag-format Optional for an inline grammar; forbidden otherwise. A URI identifying the content type and version of Symantec processor to use. Defines the tag content format for all tags with the grammar.

xml:base Optional. Allows a base URI to be defined. If set, any relative URIs within the inline grammar are resolved using this base URI. Otherwise, any relative URIs are resolved using the base URI specified within the <vxml> element.


4<grammar>


The match rate for voice inputs is very low.

There is an inconsistent match rate across match tests.

A match is returned when a no-match is expected in some test cases. This occurs with different grammar types.

A special SRGS rule (which is matched without the user speaking any word) does not work. The expected behavior is that the grammar can be used to match “zero” or silence. However, currently the rule is not matched.

A special SRGS rule (which matches any speech up until the next rule match, the next token, or the end of spoken input) does not work. Currently the rule is not matched as expected.

“0229” is recognized as “0529” for date grammars. Enter the values zero, two, two, nine and the speech server returns is returns “0529”.

Entering an invalid leap date returns a date. The expected behavior is to return an error or a nomatch.

Some digits are dropped or mismatched for ASR digit grammar. “13456” was entered but “12345” was returned.

Currency input of 100.798 drops the final digit and returns 100.79.

The speech server accepts only up to 2 decimal places for number grammar. Entered 98.765 and 98.76 was returned.

The speech server returns nomatch for a number grammar if the leading digits are zeros.

The point character (“.”) is recognized as “1” instead of dot for number grammars.

Phone grammars are incorrectly recognized. An input of “6044202978” returned “6004123457”.


Noinput was returned when a match was expected for the input: “zero six zero six zero six zero six zero six”.

Speech input in a date grammar is incorrectly interpreted. Entering “june, nineteen seventy eight” results in a returned string of “780619”.

For MRCP v1, saying or generating the speech “twelve o’clock” results in a no match being returned in a time grammar. For MRCP v2 this test succeeded.

A grammar completion failure occurs setting up an ABNF grammar.

Speech server running MRCP v1 does not return PCMA as the lead codec when only PCMA is offered. As a result, the external server actually uses the PCMU codec while the media server is streaming PCMU. When running MRCP v2 the speech server works as expected.



<help>Handles (catches) help events.

Attributes

Usage GuidelinesThe <help> element catches all events of type help.

If multiple help handlers are installed or inherited, the handler is selected according to the procedure described for event handling in [13].

This element is equivalent to <catch event=“help”>.


Parent element: <field>, <form>, <menu>, <record>, <subdialog>


count Optional. The number of times a help event may be thrown, after which the help handling routine is invoked. Regardless of the value set for count, after 5 occurrences the session terminates. The default is 5.

cond Optional. A Boolean ECMAScript expression. The help handling routine is executed if and only if the expression evaluates to true. The default is true.


4<if>

<if>Defines conditional logic.

Attributes

Usage GuidelinesThe <if> element defines procedural logic that is to be executed on satisfaction of a condition.

The <if> element may have associated <else> and/or <elseif> clauses, which define alternate logical flows.


Child elements: <assign>, <audio>, <clear>, <disconnect>, <else>, <elseif>, <exit>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>

cond Mandatory. A Boolean ECMAScript expression. The associated clause is executed if and only if the expression evaluates to true.



<initial>Provides the initial prompt in a form.

Attributes

Usage GuidelinesThe <initial> element defines procedural logic that is to be executed on satisfaction of a condition.

In a typical mixed initiative form, the <initial> element is visited when the user is initially being prompted for form-wide information, and has not yet entered into the directed mode where each field is visited individually.

Like input items, the <initial> element has prompts, catches, and event counters. Unlike input items, the <initial> element has no grammars, and no <filled> action.


Child elements: <audio>, <catch>, <link>, <noinput>, <nomatch>, <prompt>, <property>

name Optional. The name of the form item variable used to track whether the <initial> element is eligible for execution.

The default is an inaccessible internal variable.

expr Optional. An ECMAScript expression representing the initial value of the form item variable. If initialized to a value, the form item will not be visited unless the form item variable is cleared.


cond Optional. A Boolean ECMAScript expression. The form item is visited if and only if this expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.


4<item>

<item>[SRGS] Defines valid user input, as part of a DTMF or voice grammar rule.

Attributes

Usage GuidelinesThe <item> element is used in XML grammar specification rules to define valid user inputs.

For DTMF items, grammars as defined in Appendix E of [11] may be used. These are the digits 0–9, “#”, “*”, and the digits A–D. For voice-based grammars, any input acceptable by the external speech server may be used.

Tokens not enclosed in <item> elements are ignored. A grammar that has no valid <item> elements defined is rejected with an error.grammar event. (Note that this deviates slightly from [13], which states that empty grammars should be allowed.) For information on how this differs for voice-based grammars, please see Chapter 3: DTMF and Voice Grammars.

The <item> element can be nested at most three levels deep.

Parent element: <item>, <one-of>, <rule>

Child elements: <item>, <one-of>

repeat Optional. Specifies additional user detection repeat rules for a match to be declared. Supported formats are as follows:

repeat=n. Repeat n times.

repeat=m-n. Repeat between m and n times, where m is less than or equal to n, and m and n are both greater than or equal to 0.

repeat=m-. Repeat m or more times, where m is greater than or equal to 0.

repeat=0-1. Indicates that expansion is optional.

repeat-prob Optional for voice grammars; ignored for DTMF grammars. Sets the probability that the repeat attribute will succeed. Valid onlly for speech grammars and only if the repeat attribute is defined. The range is 0.0 to 1.0.

weight Ignored.

xml-lang Optional for voice grammars; ignored for DTMF grammars. The language to be used for the entire grammar. The interpretation of the value associated with xml:lang is managed and verified by the speech server.

A value for xml:lang set here overrides a value set at the specified at the <grammar> level overrides a value specified here.




The repeat attribute used in a nested <item> element returns nomatch for input that should generate a match.


4<link>

<link>Specifies a destination URL when a grammar activates a match.

Attributes

Parent element: <field>, <form>, <vxml>

Child elements: <grammar>

next Goes to the specified URI. The URI must comply with the XML anyURI format.


expr Goes to the URI resulting from evaluation of the specified ECMAScript expression. The URI must comply with the XML anyURI format.


event Throws the specified event when one of the link grammars is matched. For a list of supported events, please see the section “Events” on page 24.


eventexpr Throws the event resulting from evaluation of the specified ECMAScript expression when one of the link grammars is matched. For a list of supported events, please see the section “Events” on page 24.








Usage GuidelinesThe <link> element provides a mechanism for transitioning to a new document or dialog. Alternatively, it can be used to throw an event instead of transitioning to a new document.

The <link> element is activated when the grammar contained or specified within the element is matched. For this reason, grammars specified within the <link> element are not able to have a scope specified.

dtmf Optional. Specifies a simple DTMF sequence which, when matched, activates the specified link. White space is permitted in the DTMF sequence specification; for example “1234#” and “1 2 3 4 #” are treated as equivalent. There is no default.

Generic DTMF recognition properties (that is, interdigittimeout, termtimeout, and termchar) apply. For more information about DTMF properties, please see the section “Events” on page 24.

fetchaudio Ignored.

fetchhint Ignored.





maxage Ignored.

maxstale Ignored.


4<link>

Grammars active for a link at the root document level are active throughout all documents referenced from the root document. Grammars active for a link at the <vxml> level are active throughout the document. Grammars active for a link at the <form> level are active while the user is in the form.



<log>Generates messages for logging and troubleshooting.

Attributes

Usage GuidelinesThe <log> element allows an application to generate messages for the purpose of logging and debugging.

The messages can include events, text information, and/or results from a VoiceXML script. This facility aids application developers in debugging an application by examining its flow control and variable contents.

The element may contain any combination of text and <value> elements. The <value> element is used to de-reference ECMA script expressions and include them as a string in the message. The generated message consists of the concatenation of the text message and the string form of the value of the expr attribute in the <value> element.

All log messages generated by the <log> element are written to syslog at a severity level of INFO.

Parent element: <block>, <catch>, <filled>, <form>, <catch>, <help>, <if>, <noinput>, <nomatch>


label Optional. A string that can be used to label the log—for example, to indicate the purpose of the log.

expr Optional. An ECMAScript expression evaluating to a string that can be used to label the log—for example, to indicate the purpose of the log.


4<mark>

<mark>[SSML] Places a marker into a text or tag sequence.

Attributes

Usage GuidelinesUse the <mark> element to reference a specific location in the text/tag sequence, or to insert a marker into an output stream for asynchronous notification. When processing a mark element, a synthesis processor does one or both of the following:

• Informs the hosting environment with the value of the name attribute and with information allowing the platform to retrieve the corresponding position in the rendered output.

• When audio output of the SSML document reaches the mark, issue an event that includes the required name attribute of the element. The hosting environment defines the destination of the event.

The <mark> element does not affect the speech output process.


The TTS server does not send MARK event to SPM when it reaches <mark> element in spoken text.



name Mandatory. A token providing a unique name for the marked location; for example “here”.



<menu>Provides a fixed set of menu selections.

Attributes


Child elements: <audio>, <catch>, <choice>, <error>, <help>, <noinput>, <nomatch>, <prompt>, <promptcontrol>, <property>, <script>

id Optional. A unique identifier for the menu.


This identifier is optional. If specified, it can be used to within the application to pass control to the menu—for example, from a <goto> or a <submit>.

scope Optional. The default scope of this menu’s grammar. Supported values are as follows:

dialog: This grammar applies only to the current menu.


The default is dialog.

dtmf Optional. Defines whether <choice> elements that have not explicitly assigned DTMF key press attribute values are automatically assigned a corresponding DTMF key press. Supported values are as follows:

true: <choice> elements not explicitly set are automatically assigned a DTMF key press.

false: <choice> elements not explicitly set are not assigned a DTMF key press.


accept Ignored for DTMF and speech grammars; optional for speech recognition. For speech recognition, specifies whether user input must be exact or may be approximate. Menu grammars that specify speech are converted to XML-SRGS grammars. The supported value is exact; there is currently no mapping for approximate in XML-SRGS grammars. The default is exact.


4<menu>

Usage GuidelinesThe <menu> element provides a relatively simple mechanism (as compared to, say, a form) for allowing the user to make a choice, and transitioning to another location is based on the user’s choice.

Using audio prompts, the menu offers the user a set of choices, after which it waits for user input. The dialog transitions based on the user input.



<meta>Defines page information.

Attributes

Usage GuidelinesThe <meta> element allows specification of information about a grammar document. This element is allowed but ignored by the media server.


Providing both the name and http-equiv attributes within the <meta> element is illegal and an error is expected; however, the speech server accepted the grammar, although it eventually returned a noinput event.

In a test to verify that the <meta> element is accepted in an ABNF grammar, the grammar fails when being activated (that is, in the define grammar request). The expected behavior is for the grammar to be accepted and processed.



name A name for the metadata property describing page information. Exactly one of name or http-equiv must be specified; otherwise an error.badfetch is thrown.

content Mandatory. A value for the metadata; that is the page information to be recorded. This value can supply for an HTTP response header. This value can be accessed later by the session variable session.meta.name. If this attribute is omitted, an error.badfetch is thrown.

http-equiv Ignored. The name of an HTTP header for which the content attribute is supplying the response value. Exactly one of name or http-equiv must be specified; otherwise an error.badfetch is thrown.


4<metadata>

<metadata>[SRGS] Defines information about a document using a metadata schema.

Places a marker into a text or tag sequence.

AttributesNone.

Usage GuidelinesUse the <metadata> element to act as a container in which information about the document can be placed using a metadata schema. Although any metadata schema can be used with metadata, it is recommended that the XML syntax of the Resource Description Framework (RDF) [RDF-XMLSYNTAX] be used in conjunction with the general metadata properties defined in the Dublin Core Metadata Initiative [DC].

Document properties declared with the metadata element can use any metadata schema.


Child elements: Depends on the metatdata schema used.



<noinput>Handles (catches) a user input timeout event.

Attributes

Usage GuidelinesThe <noinput> element catches all events of type noinput.

If multiple no-input handlers are installed or inherited, the handler is selected according to the procedure described for event handling in [13].

This element is equivalent to <catch event=“noinput”>.




count Optional. The number of times a noinput event may be thrown, after which the no-input handling routine is invoked. Regardless of the value set for count, after 5 occurrences the session terminates. The default is 5.

cond Optional. A Boolean ECMAScript expression. The no-input handling routine is executed if and only if the expression evaluates to true. The default is true.


4<nomatch>

<nomatch>Handles (catches) an invalid user input event.

Attributes

Usage GuidelinesThe <nomatch> element catches all events of type nomatch.

If multiple no-match handlers are installed or inherited, the handler is selected according to the procedure described for event handling in [13].

This element is equivalent to <catch event=“nomatch”>.




count Optional. The number of times a nomatch event may be thrown, after which the no-match handling routine is invoked. Regardless of the value set for count, after 5 occurrences the session terminates. The default is 5.

(Note that [13] sets the termination value to 4. This was changed to match the value for <noinput>, and to provide backward compatibility with a previous release of the software.)

cond Optional. A Boolean ECMAScript expression. The no-match handling routine is executed if and only if the expression evaluates to true. The default is true.



<one-of>[SRGS] Allows one selection from a list of alternatives.

Attributes

Usage GuidelinesThe <one-of> element identifies a set of alternative options that are mutually exclusive.

The media server supports at most two levels of nested <one-of> elements. Deeper nesting results in the grammar being rejected, in which case an error.badfetch is thrown and the session terminated.

Parent element: <item>, <rule>

Child elements: <item>

xml:lang Optional for voice grammars; ignored for DTMF grammars. The language to be used for the entire grammar. The interpretation of the value associated with xml:lang is managed and verified by the speech server.

A value for xml:lang specified here overrides any value for xml:lang that may have been specified at a higher level and applies to all elements below this element.


4<option>

<option>Provides a simple method for specifying grammars.

Attributes

Usage GuidelinesThe <option> element provides a relatively simple way to specify grammars for collecting and processing user input. Simple DTMF or speech sequences or speech sequences can be specified within this element, rather than specifying a complex grammar.

An <option> grammar can concurrently define both a DTMF and a speech grammar in much the same way a <choice> element does. The value attribute is assigned to the result of the collection, based on the option that was matched.

Example 4-2 shows a VoiceXML script defining an <option> grammar enabled for both DTMF and speech.

• For DTMF, the values “1”, “2” and “3” will result in the <filled> element being executed.

• For speech, the words “Vancouver”, “New York”, or “Paris” will result in the <filled> element being executed.

Example 4-2 <option> Grammar Example

<form> <field name="city"> <prompt> Please select a city you would like to visit. <enumerate/>

Parent element: <field>


accept Ignored.

dtmf Optional. Specifies a simple DTMF sequence for user input collection and handling. White space is permitted in the DTMF sequence specification; for example “1234#” and “1 2 3 4 #” are treated as equivalent. There is no default.

Generic DTMF recognition properties (that is, interdigittimeout, termtimeout, and termchar) apply. For more information about DTMF properties, please see Chapter 2: VoiceXML Properties.

value Optional. Specifies a string to be assigned to the <field> name variable when this option is selected. By default, the value of the dtmf attribute is used.



</prompt> <option dtmf="1" value="vancouver "> Vancouver </option> <option dtmf="2" value="newyork "> New York </option> <option dtmf="3" value="paris "> Paris </option> <filled> <submit next="/cgi-bin/flyto.cgi" method="post" namelist="city"/> </filled> </field></form>

Example 4-3 shows an XML-SRGS grammar that is equivalent to the one shown in Example 4-2. The grammar shown in Example 4-3 would be passed to the external speech server for evaluation while the grammar shown in Example 4-2 would be parsed and processed within the media server.

Example 4-3 XML-SRGS Grammar

<grammar mode="voice" version="1.0" root="optionRoot"> <rule id="optionRoot" scope="public"> <one-of> <item> Vancouver </item> <item> New York </item> <item> Paris </item> </one-of> </rule></grammar>


4<p>

<p>[SSML] Represents a paragraph.[

Attributes

Usage GuidelinesThe use of the <p> element is optional. Where text occurs without an enclosing <p> or <s> element, the synthesis processor attempts to determine the structure using language-specific knowledge of the format of plain text.


Some TTS servers running MRCP v1 ignore the xml:lang language attribute. The server always speaks English regardless of the value of attribute xml:lang in <speak>, <p>, <s>, and <voice> elements.


Child elements: .<audio>, <break>, <emphasis>, <mark>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>

xml:lang Mandatory. Specifies the language of the paragraph.



<param>Defines a parameter to a subdialog.

Attributes

Usage GuidelinesThe <param> element allows parameters to be passed to subdialogs.

Nesting of <param> elements is not supported.

Parent element: <subdialog>

Child elements: <param>

name Mandatory. Specifies the name of the parameter to be used in the <subdialog> element.

value The value to be assigned to the parameter within the <subdialog> element.

Exactly one of value and expr must be specified.

expr An ECMAScript expression resulting in the value to be assigned to the parameter within the <subdialog> element.

Exactly one of value and expr must be specified.

valuetype Optional. Specifies, only to an <object> within a <subdialog> element, whether the value is of type data or type ref. Since the media server only supports type data, any other value is ignored.

type Optional. Specifies the media type, if the valuetype is ref. Since the media server only supports a valuetype of data, the only supported value for type is data; any other value is ignored.


4<phoneme>

<phoneme>[SSML] Provides a phonemic/phonetic pronunciation for the contained text.]

Attributes

Usage GuidelinesThe <phoneme> element provides a phonemic/phonetic pronunciation for the contained text.

The phoneme element may be empty. However, it is recommended that the element contain human-readable text that can be used for non-spoken rendering of the document. For example, the content may be displayed visually for users with hearing impairments.


The ph attribute is specified as a mandatory parameter for the <phoneme> element. However, the speech server accepts and processes the element within a SSML string without the ph attribute.



ph Mandatory. Specifies the phoneme/phone string.

alphabet Optional. Specifies the phonemic/phonetic alphabet, which in this context refers to a collection of symbols to represent the sounds of one or more human languages. Supported values are vendor-specific.



<prompt>Specifies media output to be played to a user.

Attributes.

Parent element: <block>, <catch>, <error>, <field>, <filled>, <help>, <if>, <menu>, <noinput>, <nomatch>, <record>, <subdialog>

Child elements: <audio>, <break>, <say-as>

bargein Optional. Specifies whether the audio prompt can be interrupted (“barge”) by DTMF or speech input. Supported values are as follows:

true: The prompt is bargeable, and DTMF or speech input will interrupt play. If any digits remain in the digit buffer at the time this element is executed, the clip is barged immediately and will not play.

false: The prompt is not bargeable. Any digits currently in the digit buffer are cleared, and any digits received while clip(s) are playing are discarded.

If not set, the value set for the bargein property applies. For information on the bargein property, please see Chapter 2: VoiceXML Properties.

The setting of the bargein attribute can interact with the setting of the fax detection property com.cvd.faxdetect. For that information, please see Chapter 2: VoiceXML Properties.

bargeintype Ignored.

cond Optional. A Boolean ECMAScript expression. The prompt is played if and only if this expression evaluates to true. The default is true.

count Optional. The number of times the form item can be visited for the prompt to be played. The default is 1.


4<prompt>

timeout Optional. An interval after which, if initial DTMF user input has not been received, a noinput event is thrown.



The default is 10s.

cvd:vcrprompt RadiSys extension. Optional for audio clips only; forbidden for TTS or multimedia clips. Specifies whether <promptcontrol> actions are active for this prompt. Supported values are as follows:

true: Prompt controls are active for this prompt.

false: Prompt controls are not active for this prompt.

Any specified TTS prompts are ignored; they are neither queued nor played. Multimedia clips are played but return an error on play.

Any other value results in the session terminating with an error.semantic event. The default is false.

cvd:cleardb RadiSys extension. Optional. Flushes digits from the digit buffer. Supported values are as follows:

true: All digits currently in the digit buffer will be cleared prior to playing the requested prompt. Digits are cleared independent of any value set for the bargein attribute. For details about the interaction between the cvd:cleardb attribute and the bargein attribute, please see Table 4-4 in the “Usage Guidelines.”

false: The digit buffer is not cleared before playing the requested prompt.

Any other value results in the session terminating with an error.semantic event. The default is false.

This parameter does not apply to speech; only DTMF input is buffered.



Shadow VariablesWhenever a prompt completes (with the exception of a user hang-up), a number of application-level scoped shadow variables are populated. These shadow variables provide the VoiceXML application with information about the last prompt played.

Note that if the session terminates as the result of a SIP BYE, the shadow variables are not updated with information about the prompt. In this case, numeric variables report 0 and the lasturl variable reports undefined.

For TTS clips, only the bargein variable is populated. All numeric variables report 0, and string variables report undefined.

cvd:varprompt RadiSys extension. Mandatory if the prompt contains a <say-as> element; ignored otherwise. Specifies whether the variable prompt specified in the <say-as> element is to be played by the external TTS server or using the media server’s built-in sets and variables processor. Supported values are as follows:

tts: An external TTS server plays the prompt. If this value is specified but no external TTS server is configured, the varprompt attribute is ignored.

sv: The media server plays the prompt using its internal sets and variables processor.

xml:lang Optional if the prompt contains a <say-as> child element; ignored otherwise. Specifies the language to be used in rendering the prompt. If not specified, the value specified in the xml:lang attribute within the VoiceXML root document root is used.

If the media server’s sets and variables processor is to be used to render the variable, the only supported value is en (English). In this case, specifying an unsupported language (that is, any language other than en) causes an error.unsupported.language event.

If an external TTS server is to be used to render the variable, the language value is not inspected by the media server, but is passed directly to the external TTS server.

xml:base Optional. Allows a base URI to be defined. If set, any relative URIs within the prompt specification are resolved using this base URI. Otherwise, any relative URIs are resolved using the base URI specified within the <vxml> element.

Note that a base URI can only be applied to the relative URI specified within a src attribute. It cannot be applied to a URI resulting from evaluation of an ECMAScript expression (that is, an expr attribute).


4<prompt>

Table 4-2 shows the shadow variables defined to provide information about prompt completion.

Table 4-2 Prompt Completion Shadow Variables

Shadow Variable Description

application.cvd_lastprompt$.bargein RadiSys extension. Indicates whether the prompt was barged or not. Supported values are as follows:true: The prompt was barged.false: The prompt was not barged.

application.cvd_lastprompt$.duration RadiSys extension. The amount of time, in milliseconds, consumed by the last prompt played. This is the total amount of time for the last clip, or set of clips played. If the prompt was barged, then this represents the time up to the point of being barged.Although the duration includes all clips specified for the prompt, it does not include pauses that a result of user-defined pause/resume sequences. It does, however, include any silence included as a result of using the <break> element. For multimedia clips containing both audio and video components, the duration represents the larger of the video or audio components. For example, if audio played for 5400 milliseconds and video played for 4800 milliseconds, the duration parameter reports 5400 milliseconds.If the clip fails to start playing for any reason, the value of this variable is 0.If the prompt terminates because the user hangs up, the value of this variable is 0.

application.cvd_lastprompt$.lasturl RadiSys extension. A string identifying the URL of the last audio or multimedia file played. If the prompt consisted of a set of multiple clips, and the sequence was interrupted as a result of a DTMF digit, then the value of this variable will be the URL of the file that was playing when the digit was received. Note that the value of this variable will be undefined if no clips have been played. This includes the case where a clip is barged before it starts, and the case where a clip is stopped immediately after starting because of type-ahead digits remaining in the digit buffer. If the prompt terminates because the user hung up, the value of this variable will be undefined.



There is also a family of shadow variables—the application.lastresult$.value shadow variables—defined in [13], which can be used to reference the information resulting from DTMF collection. These are shown in Table 4-3

application.cvd_lastprompt$.lasturl_offset

RadiSys extension. The position in the last clip being played when clip playing terminated. Unlike the application.cvd_lastprompt$.duration shadow variable, this is not the amount of time for which the last clip played, but rather the position in the file when clip playing completed. For clips that have no associated VCR controls this value will likely be the same as the duration of play. However, for clips that have used the media control seek action, the actual duration of clip play and position in the file may not be the same.If the prompt terminates because the user hung up, the value of this variable will be 0.This shadow variable is defined only for audio clips. For multimedia clips,containing both audio and video components, the offset represents the larger of the video or audio components. For example, if audio played for 5400 milliseconds and video played for 4800 milliseconds, the offset parameter reports 5400 milliseconds.

Table 4-3 DTMF Collection Variables


application.lastresult$.interpretation Contains the last set of collected input with the following exceptions:• For Boolean type, the variable contains true or false if

there was a match. Contains the digits otherwise.• For Currency type and DTMF, the asterisk (“*”) is con-

verted to a period (“.”) in match cases. For example, input 1*23 is converted to 1.23. For no match cases the literal digits are assigned.

• For speech, contains the value returned from the speech server interpreting the input.

application.lastresult$.utterance Contains the raw input that was received. In the example given above for Currency, the variable would report 1*23 and not 1.23. For Boolean variables, the parameter would report it would be what was entered and not true or false. For most other cases utterance and interpretation will be the same.

application.lastresult$.inputmode Contains the input mode. This is either dtmf or voice, depending on which grammars were active and which one produced the event. This shadow variable is initialized with a value of undefined and updated later with actual value (dtmf or voice) only when the <grammar> element is executed.

Table 4-2 Prompt Completion Shadow Variables



4<prompt>

application.lastresult$.confidence A value in the range of 0.0 to 1.0 representing the confidence that the input was correctly interpreted. For DTMF this is always 1.0. For speech grammars this value is returned by the speech server for a voice collection.

application.cvd_lastresult$.termcond RadiSys extension. Indicates why the last collection terminated. Prior to collection, the value of this variable is undefined. For DTMF collections, there are six termination cases:Termchar. The user pressed the defined termination character. In this case, the variable contains the defined termchar (pound sign “#” by default.)Timeout. Collection ended as a result of either an interdigit timeout, a prompt timeout, or a term timeout. In this case, the variable contains the value T. Fixed-length match. Collection ended because n digits were expected and n digits were received. In this case, the variable contains FLM. Note that FLM does not necessarily indicate a match occurred, just that the expected number of digits were received. For information about maximum length and how it applies to grammars, please see “Working with Media Files and TTS Strings” on page 28.Impossible match. Collection ended because the expected input was all digits and a non-digit non-termchar character was received. In this case, collection ends immediately and the variable contains IM. User hang-up. Collection ended because the caller hung up. In this case, the variable contains Hangup. As with all values for collection shadow variables, this value set only if the hang-up actually stops collection. If the hang-up occurs during the prompt prior to collection or at some other time, then the value will remain undefined.Fax. Collection ended because a fax tone was detected. In this case, the variable contains FAX. As with all values for collection shadow variables, this value set only if the fax tone actually stops collection. If the fax tone occurs during the prompt prior to collection or at some other time, then the value will remain undefined.For voice collections the teriminating condition is always timeout (T) for matches, or impossible match (IM) for all no-match conditions.

application.cvd_lastresult$.faxtype Radisys extension. Contains type of fax event that occurred. This is either CED or CNG. The variable reports undefined in the absence of a fax event.

Table 4-3 DTMF Collection Variables




Usage GuidelinesThe <prompt> element queues recorded audio, multimedia, Text to Speech (TTS), or recorded audio as prompts to be played to the user.

Recorded media prompts are played by embedding the <audio> element within the <prompt> element.

TTS clips can be specified as SSML, or as plain text strings with embedded SSML elements in the string. The variable prompt specified in the <say-as> element is treated as TTS string. All TTS strings (except those variable prompts to be played using the media server’s built-in sets and variables subsystem) are compiled within the media server into SSML scripts and passed to the TTS speech synthesizer to be played, provided an active speech synthesizer server is configured. If no server is configured, the string is simply ignored. All attributes of the <prompt> element, with the exception of the vcrprompt attribute, apply to TTS clips in the same way that they do to prerecorded audio clips.

Prompt ControlsThe vcrprompt attribute and the associated prompt controls are not supported for TTS or multimedia clips. If a TTS clip (including variable prompts to be played using the media server’s built-in sets and variables subsystem) is specified within a <prompt> element that has prompt controls enabled (that is, vcrprompt is true), is ignored and will be neither queued nor played. If a multimedia clipclip is specified within a <prompt> element that has prompt controls enabled, an error is returned.

Barging and PromptsWhen there are multiple prompts in a sequence, the bargein attribute is honored for each prompt as that prompt is playing. However, if a prompt is barged, no subsequent prompts from the sequence will be played, regardless of their individual bargein settings.

Voice collections are set up and active prior to a prompt being played regardless of whether or not the prompt is bargeable. Since prompts can contain a mix of bargeable and non-bargeable prompts (which require the media server to turn on and off voice grammar recognition) it is possible that spoken input during a non-bargeable prompt be returned and used after the prompt completes, which is not the expected behavior.

Table 4-4 outlines the effect of the bargein and cvd:cleardb attributes on the DTMF digit buffer at the moment an audio or TTS announcement is requested, assuming that DTMF collection is active. If the inputmodes attribute of the <property> element is set such that DTMF collection is not active, all digits in the digit buffer are cleared when the <prompt> clips are played.Table 4-4 Effect of Barging Announcements on the Digit Buffer

bargein cvd:cleardb Digit Buffer Behavior

True True Empty The announcement is started. All digits received before the announcement completes are stored in the digit buffer.


4<prompt>

True True Contains digits The digit buffer is cleared and announcement request is played. Received digits are stored in the digit buffer until the announcement complete notification is received. Note that in this case cvd:cleardb=true is overriding bargein=true.

True False Empty The announcement is started. All digits received before the announcement complete notification are stored in the digit buffer.

True False Contains digits The announcement is immediately barged. The media server transitions to a digit collection state to evaluate any digits remaining in the digit buffer.

False True Empty The announcement is started. Any digits received before the announcement completes are discarded.

False True Contains digits The digit buffer is cleared and the announcement is started. Any digits received before the announcement completes are discarded.

False False Empty The announcement is started. Any digits received before the announcement completes are discarded.

False False Contains digits Digits are cleared from the buffer and the announcement is started. Any digits received before the announcement completes are discarded.

Table 4-4 Effect of Barging Announcements on the Digit Buffer

bargein cvd:cleardb Digit Buffer Behavior



<promptcontrol>Specifies media controls for user prompt manipulation.


Usage GuidelinesThe <promptcontrol> element allows you to define VCR-like controls for playing of audio files. Prompt controls are not supported for TTS clips.

The <promptcontrol> element encloses the <controlcmd> element, which specifies a set of DTMF inputs and associated actions controlling the play of the specified audio. Voice inputs for prompt controls are not supported.

The scope of the <promptcontrol> element and the setting of the vcrprompt attribute of the <prompt> element determine when prompt control actions are in effect.

The media server supports the following prompt controls:

• Pause/resume

• Skip forward/skip backward

• Volume up/volume down

Parent element: <field>, <form>, <menu>, <vxml>

Child elements: <controlcmd>


4<property>

<property>Sets the value of a property.

Attributes

Usage GuidelinesThe <property> element allows an application to modify the value associated with a property. For a description of supported properties, please see Chapter 2: VoiceXML Properties.

The scope of the property’s value of the property is inherited from the parent element, and applies to all child elements. The lowest level value assignment for the property value overrides all higher level assignments. If no values are explicitly assigned then the default property value will be used whenever required.



name Mandatory. The name of the property being updated. Unrecognized properties are ignored. There is no default.

value Mandatory. The new value for the property. The range of values depends on the property. Specifying an invalid value for the property will result in an error.semantic.

For information about the valid values for supported VoiceXML properties, please see Chapter 2: VoiceXML Properties.



<prosody>[SSML ] Permits control of the pitch, speaking rate and volume of the speech output

Attributes

Usage GuidelinesThe <prosody> element permits control of the pitch, speaking rate and volume of the speech output.. Although each attribute individually is optional, it is an error if no attributes are specified when the prosody element is used.


All values associated with the pitch attribute are ignored in elements supporting this attribute.

All values associated with the duration attribute are ignored in elements supporting this attribute.

The contour, duration, pitch, and range attributes of the <prosody> element are ignored.

Parent element: <speak>,


pitch Optional. The baseline pitch for the contained text.

contour Optional. Sets the actual pitch contour for the contained text.

range Optional. Tthe pitch range (variability) for the contained text.

rate Optional. The change in the speaking rate for the contained text.

duration Optional. The desired time to take to read the element contents.

volume Optional. The volume for the contained text in the range 0.0 to 100.0.


4<record>

<record>Records user audio, video, or multimedia to a file.

Attributes


Child elements: <audio>, <catch>, <error>, <filled>, <grammar>, <help>, <noinput>, <nomatch>, <prompt>, <property>

name Mandatory. Specifies the name of a variable that will hold the recording. For This name will be used as an internal reference to the file after the recording is complete. To play the recorded file, reference this variable name.


The name must be unique across all <record> elements within the same scope.

Note that recordings stored internally are transient, and are deleted at the end of the session. To store recorded audio persistently, you must specify an external NFS or HTTP server. Unless you specify otherwise (using the cvd:dest or cvd:destexpr attribute) all recordings are internal and transient.

expr Optional. An ECMAScript expression representing the initial value of the name variable. If initialized to a value, the recording will not start unless the name variable is cleared.


cond Optional. A Boolean ECMAScript expression. The recording is started if and only if this expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.

modal Ignored.



beep Optional. Specifies whether to play a short fixed beep tone just prior to beginning the recording. The location of this beep tone is configurable using the media server’s management interface. Supported values are as follows:

true: The beep tone will be played just prior to recordings.

false: No beep tone is played before recordings.


maxtime Optional. Specifies a maximum recording time. If reached, the recording is terminated. In this case, the shadow variable name$.maxtime is set to true.

finalsilence Optional. Specifies the duration of post-speech silence time which, if exceeded, will terminate the recording.



The default is 5s. Note that a finalsilence value of 0 specifies that no post-speech trimming should be performed on the recording. This applies to both externally and internally recorded files.

dtmfterm Optional. Specifies whether a DTMF key press can terminate the recording. Supported values are as follows:

true: The recording will be terminated if any DTMF key is pressed, provided that the inputmodes property is set to dtmf or dtmf voice. If the inputmodes property is set to voice, DTMF key presses are ignored and the recording is not stopped.

false: DTMF key presses will not terminate the recording.

The default is true, so that by default, any DTMF keypress will terminate the recording.

The setting of the dtmfterm attribute can interact with the setting of the fax detection property com.cvd.faxdetect. For that information, please see Chapter 2: VoiceXML Properties.


4<record>

format Mandatory. Specifies the file type and encoding scheme for the recording. Supported formats are shown in Table 4-6 on page 137.

The default is audio/wav.

cvd:dest Optional. RadiSys extension. Specifies the destination for a recording for either of two cases:

1 External recording. Specifies the URI of an external NFS or HTTP server where the audio recording is to be stored persistently. The URI must conform to the guidelines given for specifying external recordings described in “Working with Media Files and TTS Strings” on page 28.

The recording is made in real time to the specified URI.

2 Appending to an existing recording. If cvd:append is set to true, the media server appends the recording to an existing recording referenced by cvd:dest. The existing recording may be either internal or external. For details on appending recordings, please see the “Usage Guidelines,” below.

Only one of cvd:dest and cvd:destexpr may be specified.

cvd:destexpr Optional. Specifies an ECMAScript expression that evaluates the URI of an external NFS or HTTP server where the audio recording is to be stored persistently. The URI must conform to the guidelines given for specifying external recordings described in “Working with Media Files and TTS Strings” on page 28. An error.semantic being thrown and session termination if the script evaluates to the ECMAScript value undefined.

Only one of cvd:dest and cvd:destexpr may be specified.

cvd:append Optional. Directs the media server to append this recording to the existing recording specified by cvd:dest. For details on appending recordings, please see the “Usage Guidelines,” below.

Valid only for audio files; append is not supported for files containing video. If this attribute is specified for a file containing multimedia, and error.badfetch is thrown.



Shadow VariablesA shadow ECMAScript variable is created for each recording. The shadow variable is name$ where name is the name specified by the name attribute. At the end of the recording, information about the recording, such as its total length, is available to the VoiceXML application.

Table 4-5 shows the shadow variables defined to provide information about recordings.

Usage GuidelinesThe <record> element allows a user audio, video, or multimedia recording to be made. Recorded audio is assigned a variable name using the name attribute. This name can be referenced within the <audio> to play back the recorded media.

Storage of Recorded FilesThe recorded file may be stored internally on the media server or streamed in real time to an external server. Audio files can be streamed to either an HTTP or an NFS server; multimedia streaming is supported only for NFS servers.

Internal recordings are transient and are automatically deleted from the media server when the VoiceXML session terminates. If a recording is to be saved, it can be posted to an HTTP server following completion of an internal recording using the <submit> element, which uses the HTTP POST method. Note that while multimedia streaming is only supported for NFS servers, multimedia files recorded internally can be posted to an HTTP server.

Table 4-5 Recording Shadow Variables


name$.duration Contains the length of the recording in milliseconds. The length reported includes the length of all announcements played plus any silence played between them. When appending to an existing audio recording, the duration amount indicates the length of just the appended portion.

name$.size Contains the length of the recording in bytes. When appending to an existing recording, the size amount indicates the size of just the appended portion.

name$.termchar Contains the DTMF termination key, if a DTMF termination key was specified at the time of start of recording and if the recording was terminated as a result of detecting the termination key.Detection of a fax tone terminating the record results in the termchar shadow variable being set to F.

name$.maxtime Indicates whether the recording was terminated as a result of reaching maximum recording time. Supported values are as follows:true: The recording terminated as a result of reaching the maximum allowed time. false: Reaching the maximum allowed time was not the reason for termination.


4<record>

Memory for internal recordings is limited, and it is recommended that longer recordings be streamed to an external server. External recordings use the HTTP PUT method, which permits real-time transfer while the recording session is in progress. The destination is specified using either the cvd:dest or the cvd:destexpr attributes.

Size of Streamed FilesBecause the recording is made in real time, the media server cannot know in advance the size of the file. Therefore, in the WAV header, the media server sets the file size to the maximum (FFFF). In addition, recordings streamed using this method will not have post-speech silence trimmed once on the external server. It is the responsibility of the server resource handling the request to trim post-speech silence (if desired) and adjust the file size in the WAV header. This can be done using (for example) a CGI script on the HTTP server.

Note also that, for the same reason, unless an appropriate CGI script is provisioned on the HTTP server, the media server will not be able to detect failed recordings.

For more information on the type of CGI script that should reside on HTTP servers, please see the material for setting up HTTP servers to interoperate with the media server, in the Convedia Media Server Guide to Working with External Servers and Peripherals.

Encoding of RecordingsRecordings are encoded as either G.711 or G.729 for audio files, and as QuickTime or 3GP format for video files. The format of the recording can be specified using the format attribute. If not specified, the format of the recording will be that configured as default, which is set using the media server’s management interface.

Table 4-6 shows the encoding formats supported for recordings. The format must be entered exactly as shown; in particular, no spaces or other characters are permitted other than those shown. The codecs parameter used by 3GPP MIME types is defined in RFC 4281 [13].Table 4-6 Supported Encoding Formats for Recordings

Format Description

audio/wav PCMU-encoded WAV file. (Audio-only.)

audio/x-wav PCMU-encoded WAV file. (Audio-only.)

audio/vnd.wave; codec=1 PCM. (Audio-only.)

audio/vnd.wave; codec=6 G.711 a-law–encoded WAV file. (Audio-only.)

audio/vnd.wave; codec=7 G.711 u-law–encoded WAV file. (Audio-only.)

audio/vnd.wave; codec=83 G.729 Annex A–encoded WAV file. (Audio-only.)

video/quicktime; codecs=h263 QuickTime file with H.263-encoded video. (Video-only.)

video/quicktime; codecs=h263, alaw QuickTime file with H.263-encoded video and G.711 a-law–encoded audio. (Multimedia.)



Stopping Recordings with DTMFBy default, any DTMF key press will stop the recording. To keep DTMF from terminating a recording, set the dtmfterm attribute to false (where the inputmodes property is also set to dtmf or dtmf voice); In this implementation, this means that DTMF will not terminate the recording. If the inputmodes property is set to voice, DTMF is ignored and will not terminate the recording.

Currently, recordings cannot be terminated by barging with speech. Also, note that since active grammars are not supported during recording, adding a voice-based grammar does not stop recording.

All record errors are fatal for the particular session. This includes all attribute specification errors, errors that are reported in the process of performing the actual recording, and errors posting the recordings to an external server for the case of internal recordings.

Best practice: The application should provide for a time when support for active grammars during recording is added. At that time, if dtmfterm is set to false, DTMF input will still terminate the recording if the DTMF input matches an active grammar. To ensure that DTMF would never end a recording, set dtmfterm to true AND ensure there is no local active grammar. Following this practice will allow you to ensure that applications are not affected should active grammars become supported.

Setting a Pre-Speech TimerA pre-speech is a timer, associated with a recording, that represents the amount of time the media server should wait before assuming the recording will not start. A pre-speech timer cannot be explicitly set through a <record> element attribute; however, it can be set using the timeout property. The value used for a pre-speech timeout is the value of the property in the current scope at the time the recording is made.

• For audio-only recordings the timeout property represents the time to wait for speech.

video/quicktime; codecs=h263, ulaw QuickTime file with H.263-encoded video and G.711 u-law–encoded audio. (Multimedia.)

audio/quicktime; codecs=alaw QuickTime file with G.711 a-law–encoded audio. (Audio-only.)

audio/quicktime; codecs=ulaw QuickTime file with G.711 u-law–encoded audio. (Audio-only.)

video/3gpp;codecs=s263,samr 3GPP file with H.263-encoded video and AMR-encoded audio. (Multimedia.) Note that order matters and extra spaces are not allowed.

video/3gpp;codecs=s263 3GPP file with H.263-encoded video. (Video-only.)

audio/3gpp;codecs=samr 3GPP file with AMR-encoded audio. (Audio-only.)

Format Description


4<record>

• For recordings containing videos, including multimedia recordings, the timeout property represents the time to wait for the first video I-frame. A noinput event thrown for a multimedia recording always means that the I-frame was not received in the time specified by the timeout property at the time the recording was made.

Trimming Post-Speech SilenceThe CMS automatically removes any post-speech silence for internally recorded files based on the value of the finalsilence attribute. For HTTP externally recorded files there is no automatic trimming of post-speech silence. To remove post-speech silence from streamed recordings, you must define a CGI script on the HTTP server. For details on setting up an HTTP server to interoperate with the media server, please see the User Guide for your media server.

Appending to a RecordingThe media server supports appending to an existing recording, for internal files or files stored on NFS servers. This mechanism is not supported for recordings on HTTP servers and it is not supported for files containing video; if attempted for either of these, an error.badfetch is thrown. The append function is enabled by setting the cvd:append attribute to true.

When you append to an existing recording, you essentially make a request to create a new recording, which consists of the original recording plus the appended audio. Recording names must be unique within a session. This means that the name of the original recording cannot be reused in the request to append. It is necessary to specify a new name for the appended recording because names for recordings must be unique within the session; therefore, the old recording name cannot be reused for the new file. Instead, the appended file must be given a new unique name.

For example, suppose the original recording is given the name record1, using the following request to record.

<record name="record1" maxtime="10s"/>

The request to append must use a new identifier for the file that will result after appending: this is record2 in the example. The file to append to (that is, the original recording record1) is specified using the cvd:dest attribute, as follows:

<record name="record2" maxtime="10s" cvd:append="true"cvd:dest="record1"/>

The cvd:dest value in conjunction with the cvd:append=true expression notifies the VXML interpreter to record to an existing file and not to a new file.

In this example, the shadow variables associated with record1 will reflect values associated with the original recording, while shadow values associated with the appended recording will be referenced using the name record2.



Also, note that in this example, the code would need to be executed in the same VXML script to ensure that the record1 variable does not go out of scope. If these operations are to span multiple documents, the value of this variable must be assigned to an application scope variable.

Table 4-7 shows the recording behavior depending on the various values for cvd:append and cvd:dest (or cvd:destexpr). Note that all these cases assume that the value set for the name attribute is unique (that is, unfilled) for this session. If the name attribute is defined (filled) then the recording does not occur, as specified in [13].Table 4-7 Summary of “append” Behavior

cvd:appendcvd:dest and cvd:destexpr Behavior

False Undefined The recording is treated as a normal internal recording.

False Internal recording An error.badfetch is thrown. The cvd:dest attribute must specify an external recording is UNLESS cvd:append=true.

False External recording The recording is treated as a normal internal recording. If the external recording already exists, it is overwritten.

True Undefined Creates a new recording. This is equivalent to cvd:append missing or false.

True Internal recording • File exists: Current recording is appended to the existing file, assuming that internal recording variable used evalu-ates to valid recording content then the recording pro-ceeds. If the variable used to represent the existing internal recording does not evaluate as defined, or is incor-rectly formatted then the call is rejected with an error.semantic.

• File does not exist: New file is created and recording occurs on that new file, assuming that internal recording variable used evaluates to valid recording content then the recording proceeds. If the variable used to represent the existing internal recording does not evaluate as defined, or is incorrectly formatted then the call is rejected with an error.semantic.

True External recording • File exists: Current recording is appended to the existing file.

• File does not exist: New file is created and recording occurs on that new file.


4<reprompt>

<reprompt>Repeats a prompt for user input.


Usage GuidelinesThe <reprompt> element allows the application to revisit an originating prompt from an event handler, such as the <catch> element. This mechanism, along with incrementing prompt counters, can be used to vary prompts to the user when user input does not match expected results.





<return>Return from a subdialog to the calling dialog.

Attributes



event Optional. Throws the specified event in the calling dialog after the return from the subdialog. For a list of supported events, please see the section “Events” on page 24. There is no default.

Only one of event, eventexpr, and namelist may be specified. Otherwise, an error.badfetch is thrown.

eventexpr Optional. Throws the event resulting from evaluation of the specified ECMAScript expression in the calling dialog after the return from the subdialog. For a list of supported events, please see the section “Events” on page 24. There is no default.

Only one of event, eventexpr, and namelist may be specified. Otherwise, an error.badfetch is thrown.

namelist Optional. Returns the specified list of variable names to the calling dialog. Format is a space-separated list of variable names. By default, the calling context receives an empty ECMAScript object back.

Note that specifying a namelist does not cause an event to be thrown.

message Optional. Returns the specified message string, along with the event name, to the calling dialog when an event is thrown. There is no default.

The message string can be accessed within the <catch> element of the calling dialog using the _message implicit variable.


messageexpr Optional. Returns the message string resulting from evaluation of the specified ECMAScript to the event handler when an event is thrown, along with the event name. There is no default.

The message string can be accessed within the <catch> element of the calling dialog using the _message implicit variable.



4<return>

Usage GuidelinesThe <return> element terminates the execution of a subdialog, and returns control back to the calling dialog and, optionally, data.

The <return> element can also be used to throw an event in the calling dialog, such as a nomatch event. For example, <return event=”nomatch”/> will trigger the nomatch event handler in the calling dialog.

In addition, the <return> element can be used to return results to the calling dialog. For example, suppose the variable cardnumber is defined within a subdialog and populated by user input. Then <return namelist=”cardnumber”/> returns the cardnumber to the calling dialog, which can access its value using subdialog-name.cardnumber, where subdialog-name is the name specified for the subdialog.



<rule>[SRGS] Defines a grammar rule for an inline DTMF or voice grammar.

Attributes

Usage GuidelinesThe <rule> element defines an inline XML grammar rule for DTMF or voice.

All SRGS grammars must have a valid set of rules or items to be considered a valid grammar. Grammars that evaluate to empty, that is have no defined items within the grammar are rejected, with session termination and an error.grammar.

Grammars that contain tokens not enclosed in <item> elements are ignored.

Only one rule may be at any time. Thus, for inline grammars that could active concurrently, one grammar will actually be active . The second grammar that defines its own rule or omits the rule is ignored.

To enable concurrent DTMF and voice grammars, two grammars must be defined at the same level of scope within a VoiceXML script.

Parent element: <grammar>


id Mandatory. An identifier for the rule. The identifier must be unique within the grammar.


scope The scope of this rule’s grammar. The supported value is as follows:

public: This rule may be referenced by other rules within the current grammar, and by rules in other grammars.

private: Not supported.

Strictly speaking, this attribute is optional. However, the default defined in the VoiceXML 2.0 specification is private, which is not supported by the media server. Therefore, the application should explicitly include the scope attribute with a value of public (scope=“public”). This will ensure correct interworking with the media server if full grammar scoping capabilities are implemented.


4<ruleref>

<ruleref>[SRGS] Allows another voice grammar rule to be included.

Attributes

Usage GuidelinesThe <ruleref> element defines an inline XML grammar rule. Currently, only voice grammar rules are supported; DTMF grammar rules are not supported.

All SRGS grammars must have a valid set of rules or items to be considered a valid grammar. Grammars that evaluate to empty, that is have no defined items within the grammar are rejected, with session termination and an error.grammar.

Grammars that contain tokens not enclosed in <item> elements are ignored.

Only one rule may be at any time. Thus, for inline grammars that could active concurrently, one grammar will actually be active. The second grammar that defines its own rule or omits the rule is ignored.

To enable concurrent DTMF and voice grammars, two grammars must be defined at the same level of scope within a VoiceXML script.

Parent element: <grammar>


id Mandatory. An identifier for the voice grammar rule. The identifier must be unique within the grammar.


scope The scope of this rule’s grammar. The supported value is as follows:

public: This rule may be referenced by other rules within the current grammar, and by rules in other grammars.

private: Not supported.

Strictly speaking, this attribute is optional. However, the default defined in the VoiceXML 2.0 specification is private, which is not supported by the media server. Therefore, the application should explicitly include the scope attribute with a value of public (scope=“public”). This will ensure correct interworking with the media server if full grammar scoping capabilities are implemented.



<s>[SSML] Represents a sentence.[

Attributes

Usage GuidelinesThe use of the <s> element is optional. Where text occurs without an enclosing <p> or <s> element, the synthesis processor attempts to determine the structure using language-specific knowledge of the format of plain text.




Child elements: .<audio>, <break>, <emphasis>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>

xml:lang Mandatory. Specifies the language of the sentence.


4<say-as>

<say-as>[SSML] Defines a text string to be rendered as an audio clip.

Attributes

Usage Guidelines

The media server uses the SSML <say-as> element to allow the control agent to use a subset of the media server’s sets and variables processing subsystem. For general information on the media server’s sets and variables feature, please see the Convedia Media Server Sets and Variables Interface Reference Guide.

The <say-as> element is used as a child of the <prompt> element to specify the variable to be rendered in the prompt. The media server uses its built-in sets and variables processing subsystem; no TTS server is required. However, all the clips to be played must be internally provisioned on the media server and an audio segment configuration file (the “sets and variables configuration file”) must also be provisioned on the media server. See the Convedia Media Server Sets and Variables Interface Reference Guide for this information.

The <say-as> element can contain either a child <value> element specifying the variable to be rendered or a plain text string specifying the variable. The variable type is indicated by the interpret-as attribute and the variable subtype is indicated by the format attribute. Supported variable types and subtypes are described in the Convedia Media Server Sets and Variables Interface Reference Guide.

If the value of the variable is out of the supported range, the media server terminates the call, without throwing an error.semantic event.

In general, the language in which the variable is to be rendered is specified by the xml:lang attribute at either the document level (that is, within the <vxml> element) or within the <prompt> element. Currently, the only supported value is en (English).

Parent element: <prompt>

Child elements: <value>

interpret-as Mandatory. Used for VoiceXML variables to indicate the type of the variable. Supported values are date, time and digits. Supported variable types for VoiceXML are described in the Convedia Media Server Sets and Variables Interface Reference Guide.

format Optional or mandatory, depending on the variable type specified by the interpret-as attribute. (Currently, all supported variable types have mandatory subtypes.) Used for VoiceXML variables to indicate the subtype of the variable. Supported variable subtypes for VoiceXML are described in the Convedia Media Server Sets and Variables Interface Reference Guide.

detail Ignored.



<script>Executes ECMAScript (JavaScript) code.

Attributes

Parent element: <block>, <catch>, <error>, <filled>, <form>, <help>, <if>, <menu>, <noinput>, <nomatch>, <vxml>


src Optional. Specifies the URI to the script, if the script is external. If not specified, the media server expects the script to be defined inline.

charset Ignored.

fetchhint Ignored.





maxage Ignored.

maxstale Ignored.


4<script>

Usage GuidelinesThe <script> element specifies ECMAScript client-side logic.

The results of the computation performed by the script can be returned to the caller and stored in a variable. The contents of the variable can be used later by the VoiceXML application for general use, such as conditional logic or dialogs utilizing the variable.

The script can be fetched externally or it can be specified in-line.



<speak>[SSML] The root element of SSML.[

Attributes

Usage GuidelinesThe <speak> element is the root element of the Speech Synthesis Markup Language (SSML), which is an XML application for speech synthesis.

The <speak> element is not supported directly in VoiceXML scripts. Rather, all TTS scripts are rendered into <speak> SSML XML scripts, which are then passed to an external server for playing. Including a <speak> element with TTS text in a VoiceXML document will cause a parse error.



Parent element: <?xml>

Child elements: .<audio>, <break>, <emphasis>, <mark>, <meta>, <metadata>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>

xml:lang Mandatory. Specifies the language of the root document.

xml:base Optional. Specifies the base URI of the root document

version Mandatory. The SSML version. The only supported value is 1.0.


4<sub>

<sub>[SSML] Replaces the contained text with a substitute.[

Attributes

Usage GuidelinesThe <sub> element The sub element is employed to indicate that the text in the alias attribute value replaces the contained text for pronunciation. This allows a document to contain both a spoken and written form. The required alias attribute specifies the string to be spoken instead of the enclosed string. The processor should apply text normalization to the alias value.

The <sub> element can only contain text to be rendered.


The specification states that the alias attribute of the <sub> element is mandatory; however, this is not enforced by the speech server.


Child elements: .None.

alias Mandatory. Provides the text to be substituted for the enclosed string.



<subdialog>Invokes another dialog, from which control will eventually return.

Attributes

Parent element: <block>, <catch>, <error>, <filled>, <form>, <help>, <noinput>, <nomatch>

Child elements: <audio>, <catch>, <error>, <filled>, <help>, <noinput>, <nomatch>, <param>, <prompt>, <property>

name Optional. Defines a variable with the specified name, which will hold the return values returned by the <subdialog> element.

The scope of the returned value is limited to the form. The values are returned from the subdialog in the namelist specified in the <return> element. The return values can be accessed using the shadow variable name$.ReturnedVariableName.


There is no default.

expr Optional. An ECMAScript expression assigning the initial value of the form item variable defined by name. If the initial value is set using this attribute, the form item will not be executed until the variable is cleared (for example, by using the <clear> element). The default is the ECMAScript value undefined.

cond Optional. A Boolean ECMAScript expression. The subdialog is executed if and only if the expression evaluates to true. There is no default for cond, but if cond is not specified, the behavior is as if cond is set to true.

namelist Optional. Specifies a set or list of variables to submit to the subdialog. Any declared VoiceXML or ECMAScript variable, including shadow variables, can be included in the list. By default, no variables are submitted.


4<subdialog>

src The URI of the subdialog. The URI must comply with the XML anyURI format.

If the subdialog is contained within the current document, the format is #dialog-name; for example #SubdialogX.

Exactly one of src and srcexpr must be specified. Otherwise, an error.badfetch is thrown.

srcexpr An ECMAScript expression evaluating to the URI of the subdialog. The URI resulting from the expression must comply with the XML anyURI format.

Exactly one of src and srcexpr must be specified. Otherwise, an error.badfetch is thrown.

method Optional. Specifies the HTTP method to be used in submitting. Supported values are as follows:

get: An HTTP GET method will be used.

post: An HTTP POST method will be used.

The default is get.

enctype Optional. The MIME encoding method to be used in submitting. The only supported value is application/x-www-form-urlencoded. This is the default.

fetchaudio Ignored.

fetchhint Ignored.



Usage GuidelinesThe <subdialog> element provides ability to transition to a new interaction, much like a function call. Subdialogs are useful in creating and organizing commonly used dialog functions as a libraries, which can be reused by many applications.

When the subdialog is complete, control is returned to the calling dialog. The state of the calling dialog (active grammars, variables, event handlers, and so on) are preserved when the called dialog is invoked, and restored when the called dialog returns control back to the calling dialog.

The calling dialog can pass variables to the called dialog using the namelist attribute of the <subdialog> element. The called dialog returns control back to the calling dialog by executing the <return> element, and the <return> element can also return variables from the subdialog to the calling dialog.

Unlike a subroutine, the called dialog does not have access to any information from the context of the calling dialog. This is because the calling and the called dialogs execute in two separate and independent execution contexts. Thus, for example, events thrown in the called dialog must be handled in that dialog; they cannot invoke event handlers in the calling dialog. In addition, variables scoped by the calling dialog are not accessible by the called dialog, and any variables scoped by the called dialog are not accessible when control returns back to the calling dialog.





maxage Ignored.

maxstale Ignored.


4<submit>

<submit>Submit application values and fetch a new document, transitioning to a new dialog.

Attributes



next Submits to the specified URI. The URI must comply with the XML anyURI format.

Exactly one of next and expr must be specified. Otherwise, an error.badfetch is thrown.

expr Submits to the URI resulting from evaluation of the specified ECMAScript expression. The URI must comply with the XML anyURI format.

Exactly one of next and expr must be specified. Otherwise, an error.badfetch is thrown.

namelist Optional. The variables to submit as data. Format is a space-separated list of variable names. Both VoiceXML and ECMAScript variables can be included. By default, all named input item variables are submitted.

method Optional. Specifies the HTTP method to be used in submitting. Supported values are as follows:

get: An HTTP GET method will be used.

post: An HTTP POST method will be used.

The default is get.

enctype Optional. The MIME encoding method to be used in submitting. The only supported value is application/x-www-form-urlencoded. This is the default.

fetchaudio Ignored.

fetchhint Ignored.



Usage GuidelinesThe <submit> element allows the application to submit variables to an external HTTP server and transition control to a new VoiceXML document.

The variables to be sent are listed in the namelist attribute. This data is sent as URI-encoded parameters to the HTTP server. Data can be sent using either the HTTP GET or the HTTP POST method. The values submitted can be fixed strings, internal variables (for example, field items or property variables), or ECMAScript expressions. Expressions are evaluated first and then converted to strings before submitting.

The execution of a <submit> element will always result in a document fetch. The document specified by the next or the expr attribute is returned by the HTTP server, and application control transitions to this document.





maxage Ignored.

maxstale Ignored.


4<throw>

<throw>Generates an event to be handled by <catch>.

Attributes

Usage GuidelinesThe <throw> element throws the specified event to be caught by the <catch> element.

The event can be pre-defined (for example, a nomatch event), or it may be application-specific.



event Throws the specified event. The event may be predefined, or application-specific. For a list of supported events, please see the section “Events” on page 24.

Exactly one of event and eventexpr must be specified. Otherwise, an error.badfetch is thrown.

eventexpr Throws the event resulting from evaluation of the specified ECMAScript expression. The event may be predefined, or application-specific. For a list of supported events, please see the section “Events” on page 24.

Exactly one of event and eventexpr must be specified. Otherwise, an error.badfetch is thrown.


The message string can be accessed using the _message implicit variable.



The message string can be accessed using the _message implicit variable.




<value>Inserts the value of an expression into a log message or prompt.

Attributes

Usage Guidelines

The <value> element is used in the <log> element to insert the text of the log message into the log. In this context, the <value> element can be used to de-reference ECMA script expressions and include them in the output of the <log> message. Note that all <log> messages are written to syslog at a severity level of ERROR

The <value> element is used in the <say-as> element to insert the value of an expression into a prompt.

Parent element: <log>, <say-as>


expr Mandatory. An ECMAScript expression, the value of which will be inserted into the log message.


4<var>

<var>Declares a variable and assigns it a value.

Attributes

Usage GuidelinesThe <var> element declares a variable and assigns it a value.

Proper scoping rules are observed as defined in [13]. The naming of user-defined variables adheres to the naming convention specified in Section 5.1 of [13]. The maximum length of a variable is 256 characters.

In general, naming errors result in an error.semantic being thrown.The exception is the error where variable names end in the dollar sign (“$”). This error results in an error.badfetch.



name Mandatory. The name of the variable.


expr Optional. An ECMAScript expression representing the value of the variable. If not specified, then if the variable was previously declared, it retains its original value. Otherwise, the ECMAScript value undefined is assigned to the variable.



<voice>[SSML] Requests a change in speaking voice.[

Attributes

Usage GuidelinesThe <voice> element is a production element that requests a change in speaking voice.

Although each attribute individually is optional, it is an error if no attributes are specified when the <voice> element is used.

The <voice> element is commonly used to change the language. When there is not a voice available that exactly matches the attributes specified in the document, or there are multiple voices that match the criteria, a voice selection algorithm must be used. Approximately speaking, the xml:lang attribute has the highest priority and all other attributes are equal in priority but below xml:lang.


Child elements: .<audio>, <break>, <emphasis>, <mark>, <p>, <phoneme>, <prosody>, <s>, <say-as>, <sub>, <voice>

xml:lang Optional. Specifies the language of the paragraph.

gender Optional. Indicates the preferred gender of the voice to speak the contained text. Supported values are as follows:

male: Use a male voice.

female: Use a female voice.

neutral: Use a neutral voice.

age Optional. Indicates the preferred age since birth, in years, of the voice to speak the contained text. The range is a non-negative integer.

variant Optional. Indicates a preferred variable of the other voice characteristics to speak the contained text (for example, the second male child voice). Valid values are of the type positive integer.

name Optional Indicates a processor-specific voice name to speak the contained text. The value may be a space-separated list of names ordered from most-preferred to least-preferred. Consequently, a name may not contain any white space.


4<voice>



All attributes of the <voice> element are ignored.



<vxml>The root element for VoiceXML. Defines the set of actions that form a VoiceXML dialog.

Attributes

Usage Guidelines<vxml> is the root element for VoiceXML.

It contains a VoiceXML document, which can be an entire application or a portion of an application.

Parent element: None. The root element for VoiceXML.

Child elements: <catch>, <error>, <form>, <help>, <link>, <menu>, <noinput>, <nomatch>, <promptcontrol>, <property>, <script>, <var>

version Mandatory. The W3C specification of the enclosed VoiceXML document. Supported values are 2.0 and 2.1.

xmlns Mandatory. The namespace of the VoiceXML document. The only supported value is http://www.w3.org/2001/vxml.

xml:base Optional. Allows a base URI to be defined. If set, any relative URIs within the document are resolved using this base URI.

xml:lang Optional. Specifies the language identifier for this document. If not specified, the default is en (English). If specified, the language identifier is inherited by all elements in the document that use the xml:lang attribute. Note that a value specified for xml:lang within an element overrides that specified at the document level. Specifying an unsupported language results in an error.unsupported.language event.

application Optional. The URI of this document’s root document, if any. If specified, the implication is that this document is a leaf document.

xmlns:cvd Optional. The namespace of the XML schema defined for the cvd prefix, which indicates a RadiSys extension. This is optional for any VoiceXML script that uses a cvd prefix, such as cvd:append, cvd:dest, or cvd:destexpr.


4<vxml>

The media server accepts 2.1 as the value of the version attribute of the <vxml> elemement. Where VoiceXML version 2.1 differs from version 2.0, the media server complies with version 2.0, with the following exceptions:

• The media server supports the elements described in “Chapter 4: VoiceXML 2.0 Elements,” as defined in [13].

• The media server supports the ECMAScript binding Level 2 subset of the Document Object Module (DOM) as described in “Chapter 4: VoiceXML 2.0 Elements.”












5

This chapter describes the VoiceXML 2.1 elements currently supported by the Convedia Media Server.

The VoiceXML 2.1 language is defined by the W3C Recommendation specifying the language [14]. Any features of VoiceXML specified in the Recommendation but not in this guide are not supported in this release of the Convedia Media Server. Any features of VoiceXML specified in this guide but not in the Recommendation are extensions to the specification.

Chapter 5: VOICEXML 2.1 ELEMENTS



<data>Fetches XML data from a document server without transitioning to a new VoiceXML document.

Attributes

Parent element: <block>, <catch>, <error>, <filled>, <foreach>, <help>, <if>, <noinput>, <nomatch>, <vxml>


src The URI representing the location of the XML data to retrieve. Only HTTP URIs are supported. The URI must comply with the XML anyURI format. If a relative URI is specified, it is qualified using the base URI

Exactly one of src and srcexpr must be specified; otherwise, an error.badfetch is thrown.

name Optional. The name of a variable exposing the Document Object Module (DOM). If this attribute is not specified, the retrieved content is ignored.

srcexpr An ECMAScript expression representing the new value of the variable. This value dynamically determines the URI at the time that the data needs to be fetched. A URI resulting from the expression must comply with the XML anyURI format.

Exactly one of src and srcexpr must be specified; otherwise, an error.badfetch is thrown.

method Optional. The request method. Supported values are get and post.The default value is get.


5<data>

namelist Optional. The list of variables to submit. Supported values are as follows:

• Individual variable references which are submitted with the same qualification used in the namelist.

• Declared VoiceXML and ECMAScript variables can be referenced.

• The media server supports ECMAScript objects in namelist with following restrictions:

• The value of the method attribute must be post. If the value of the method attribute is get, the media server raises an error.badfetch exception.

• The maximum nesting level is four if the ECMAScript object contains other objects.

• The body of the post request contains the ECMAScript object as an XML file.

• The XML file contains all nested objects, each contained within an XML element.

• Properties of all objects are each represented as an XML element, for which the property name is the element name and the property value is the content.

• When the enctype is “application/x-www-form-urlcoded” the XML is sent in the post body as a single line using standard escaping rules and without whitespace.

• When the enctype is “text/xml” the XML is sent in the post body in standard XML format.

By default, no variables are submitted.

enctype Optional. The media encoding type of the submitted document. Supported value are as follows:

• “application/x-www-form-urlencoded”

• ”text/xml” (only when the namelist is an ECMAScript object)

The media server returns an error.batch if an unsupported value is specified (e.g. “multipart/form-data”) or if “text/xml” is specified when the namelist is not an ECMAScript object.

The default value is “application/x-www-form-urlencoded”.



Usage GuidelinesThe <data> element fetches XML data without transitioning to a new XML document. The XML data fetched by the <data> element is bound to an ECMAScript through the variable named by the name attribute; this variable exposes a read-only subset of the W3C Document Object Model (DOM).

If the content cannot be retrieved, the media server raises an error.badfetch exception. If the retrieved content is not well-formed XML, the media server raises an error.semantic exception.

fetchaudio

The maximum length of the URI string is 255 characters. The supported fetchaudio source is internal provisioned clips and external NFS or HTTP. Clip type must be audio-only, video-only, or multimedia. TTS, RTSP media and sets and variables are not supported.

The playing of the audio clip is governed by the fetchaudiodelay and fetchaudiominimum properties in effect at the time of the fetch.

fetchhint Optional. Ignored.

fetchtimeout Optional. Optional. The interval after which, if the document cannot be fetched from the destination URI, the fetch times out.




maxage Optional. Ignored.

maxstale Optional. Ignored.


5<data>

The media server supports only US-ASCII characters in UTF-8 encoding format in XML documents retrieved with the <data> element.

The media server does not support the “access-control” feature of the <data> element.



<foreach>Allows a VoiceXML application to iterate through an ECMAScript array, executing the content of each array item..

Parent and child elements for a <foreach> element used within executable content:

Parent and child elements for a <foreach> element used within a <prompt> element:

Attributes

Usage GuidelinesThe <foreach> element allows a VoiceXML element to execute content from within an ECMAScript array.

Both the array and item attributes must be specified; otherwise, the media server raises an error.badfetch exception. If the resulting evaluation of the array does not satisfy the instanceof(Array) statement in the ECMAScript, the media server raises an error.semantic exception.

The <foreach> element operates on a shallow copy of the array specified by the array attribute; this means that only the reference is copied. For example, a shallow copy of an array of pointers to strings copies only the pointers, leaving the underlying character strings as the actual data (not copies).

The <foreach> element may appear within executable content and as a chiild element of the <prompt> element. When the <foreach> element is within executable content it may itself contain elements of executable content. When the <foreach> element is within a <prompt> element, it can contain only elements that are valid in the <enumerate> element; that is: <audio>, <break>, and <foreach>.

Parent element: <block>, <catch>, <error>, <filled>, <foreach>, <help>, <if>, <noinput>, <nomatch>

Child elements: <audio>, <assign>, <clear>, <data>, <disconnect>, <exit>, <foreach>, <goto>, <if>, <log>, <prompt>, <reprompt>, <return>, <script>, <submit>, <throw>, <var>

Parent element: <foreach>, <prompt>

Child elements: <audio>, <break>, <foreach>

array Mandatory. An ECMAScript expression that must evaluate to an ECMAScript array.

item Mandatory. The variable that stores each array item upon each iteration of the loop. If the variable is not already defined within the parent’s scope, a new variable is declared.


5<foreach>

The media server supports up to two levels of nesting with a <foreach> element. If the level of nesting is greater than two, the media server raises an error.semantic exception.




6

This chapter describes the ECMAScript binding for the subset of Level 2 of the DOM.

The ECMAScript binding for the subset of Level 2 of the Document Object Model (DOM) exposed by the <data> element is specified in Appendix D of the W3C Recommendation Voice Extensible Markup Language (VoiceXML) 2.1 [14]. The media server supports the following objects from this specification; specific support is described in this chapter.

• Attr Object

• CDATASection Object

• CharacterData Object

• Comment Object

• Document Object

• DOMException Prototype Object

• Element Object

• EntityReference Object

• NamedNodeMap Object

• Node Prototype Object

• NodeList Object

• ProcessingInstruction Object

• Text Object

Chapter 6: ECMASCRIPT LANGUAGE BINDING FOR THE DOM


ECMAScript Language Binding for the DOM6

Attr Object For the Attr object, the media server supports the following constants, properties, and methods.

Constants

None.

Properties

Methods

Property Type Read-Only

name String Yes

specified Boolean Yes

value String No

ownerElement Element Yes

nodeName String Yes

nodeValue String No

nodeType Number Yes

parentNode Node Yes

childNodes NodeList Yes

firstChild Node Yes

lastChild Node Yes

previousSibling Node Yes

nextSibling Node Yes

attributes NamedNodeMap Yes

ownerDocument Document Yes

namespaceURI String Yes

prefix String No

localName String Yes

Method ReturnsParameter Name

Parameter Type

hasChildNodes Boolean — —

hasAttrbutes Boolean — —


6CDATASection Object

CDATASection ObjectFor the CDATASection object, the media server supports the following constants, properties, and methods.

Constants

None.

Properties

Methods


data String No

length Number Yes

nodeName String Yes

nodeValue String No

nodeType Number Yes

parentNode Node Yes


firstChild Node Yes

lastChild Node Yes






prefix String No



Parameter Type

substringData String offset, count

NumberNumber





CharacterData Object For the CharacterData object, the media server supports the following constants, properties, and methods.

Constants

None.

Properties

Methods


data String No

length Number Yes

nodeName String Yes

nodeValue String No

nodeType Number Yes

parentNode Node Yes


firstChild Node Yes

lastChild Node Yes






prefix String No



Parameter Type


NumberNumber




6CharacterData Object

Usage Guidelines

If a DOMException object is raised on retrieval of the CharacterData.data property or the CharacterData.substringData method and not caught by an ECMAScript execution handler, the media server raises an error.semantic exception.



Comment ObjectFor the Comment object, the media server supports the following constants, properties, and methods.

Constants

None.

Properties

Methods


data String No

length Number Yes

nodeName String Yes

nodeValue String No

nodeType Number Yes

parentNode Node Yes


firstChild Node Yes

lastChild Node Yes






prefix String No



Parameter Type


NumberNumber




6Document Object

Document Object For the Document object, the media server supports the following constants, properties, and methods.

Constants

None.

Properties

Methods


documentElement Element Yes

nodeName String Yes

nodeValue String No

nodeType Number Yes

parentNode Node Yes


firstChild Node Yes

lastChild Node Yes






prefix String No



Parameter Type

getElementsByTagName NodeList tagname String

getElementsByTagNameNS NodeList namespaceURI,localName

StringString

getElementById Element elementId String





DOMException Prototype Object For the DOMException Prototype object, the media server supports the following constants, properties, and methods.

Constants

Properties

Methods

None.

Usage Guidelines

If a DOMException object is raised and not caught by an ECMAScript execution handler on retrieval of the Node.nodeValue property, the CharacterData.data property, or the CharacterData.substringData method, the media server raises an error.semantic exception.

Constant Type Value

INDEX_SIZE_ERR Number 1

DOMSTRING_SIZE_ERR Number 2

NO_MODIFICATION_ALLOWED_ERR Number 7

NOT_FOUND_ERR Number 8

NOT_SUPPORTED_ERR Number 9

INVALID_STATE_ERR Number 11


code Number No


6Element Object

Element Object For the Element object, the media server supports the following constants, properties, and methods.

Constants

None.

Properties

Methods


tagName String Yes

nodeName String Yes

nodeValue String No

nodeType Number Yes

parentNode Node Yes


firstChild Node Yes

lastChild Node Yes






prefix String No



Parameter Type

getAttribute String name String

getAttributeNode Attr name String

getElementsByTagName NodeList name String

getAttributeNS String namespaceURI, localName

StringString

getAttributeNodeNS Attr namespaceURI, localName

StringString

getElementsByTagNameNS NodeList namespaceURI, localName

StringString

hasAttribute Boolean name String



hasAttributeNS Boolean namespaceURI, localName

StringString




Parameter Type


6EntityReference Object

EntityReference ObjectFor the EntityReference Prototype object, the media server supports the following constants, properties, and methods.

Constants

Properties

Constant Type Value

ELEMENT_NODE Number 1

ATTRIBUTE_NODE Number 2

TEXT_NODE Number 3

CDATA_SECTION_NODE Number 4

ENTITY_REFERENCE_NODE Number 5

PROCESSING_INSTRUCTION_NODE Number 7

COMMENT_NODE Number 8

DOCUMENT_NODE Number 9


nodeName String Yes

nodeValue String No

nodeType Number Yes

parentNode Node Yes


firstChild Node Yes

lastChild Node Yes






prefix String No




Methods


Parameter Type




6NamedNodeMap Object

NamedNodeMap Object For the NamedNodeMap object, the media server supports the following constants, properties, and methods.

Constants

None.

Properties

Methods


length Number Yes


Parameter Type

getNamedItem Node name String

item Node index Number

getNamedItemNS Node namespaceURI, localName

StringString



Node Prototype Object For the Node Prototype object, the media server supports the following constants, properties, and methods.

Constants

Properties

Constant Type Value



TEXT_NODE Number 3







nodeName String Yes

nodeValue String No

nodeType Number Yes

parentNode Node Yes


firstChild Node Yes

lastChild Node Yes






prefix String No



6Node Prototype Object

Methods

Usage Guidelines

If a DOMException object is raised on retrieval of the Node.nodeValue property and not caught by an ECMAScript execution handler, the media server raises an error.semantic exception.


Parameter Type





NodeList Object For the NodeList object, the media server supports the following constants, properties, and methods.

Constants

None.

Properties

Methods


length Number Yes


Parameter Type

item Node index Number


6ProcessingInstruction Object

ProcessingInstruction Object For the ProcessingInstruction object, the media server supports the following constants, properties, and methods.

Constants

Properties

Constant Type Value



TEXT_NODE Number 3







target String Yes

data String No

nodeName String Yes

nodeValue String No

nodeType Number Yes

parentNode Node Yes


firstChild Node Yes

lastChild Node Yes






prefix String No




Methods

Usage Guidelines

The media server parses and interprets the “xml”processing instruction. The media server does not support any other processing instruction. Unsupported processing instructions cause the media server to raise an error.semantic exception.

The media server does not generate any processing instruction objects.


Parameter Type




6Text Object

Text ObjectFor the Text object, the media server supports the following constants, properties, and methods.

Constants

None.

Properties

Methods


data String No

length Number Yes

nodeName String Yes

nodeValue String No

nodeType Number Yes

parentNode Node Yes


firstChild Node Yes

lastChild Node Yes






prefix String No



Parameter Type


NumberNumber






A

This appendix describes some development practices that can help you maximize performance and capacity of your VoiceXML applications.

Appendix A: BEST PRACTICES FOR VOICEXML DEVELOPMENT


Best Practices for VoiceXML DevelopmentA

The coding practices recommended in this appendix are designed to guide developers in developing code for the RadiSys Convedia Media Server’s VoiceXML interpreter. They are designed to help development partners achieve optimal performance on the RadiSys Convedia Media Server’s VoiceXML interface.

1 Store permanent audio clips on the media server.

Provisioning permanent audio clips internally on the media server, rather than on an external NFS or HTTP server, allows more efficient clip retrieval. In addition, storing clips internally removes any issues relating to interconnectivity with the NFS or HTTP server that could occur, reducing debugging time.

2 If you store permanent clips externally, use NFS.

If you must store provisioned audio clips on an external server, RadiSys recommends using an NFS server. RadiSys currently does not recommend using HTTP for recording and playing back permanent audio clips.

If you must record to an external HTTP server, use the <submit> element. This element records the file internally until it completes, and then uses the HTTP POST method to post the file to the HTTP server.

3 Record temporary audio clips on the media server.

If the application records audio clips for temporary use, it is most efficient to store the temporary clips internally on the media server. Clips that are recorded on the media server are transient: they are deleted when the connection with which they are associated is closed. They are also volatile: they will not survive a reset cycle.

4 Consolidate VoiceXML documents.

The number of document transitions, which have a high CPU overhead, can vary per application. In order to achieve higher capacity, consolidate the VoiceXML logic or flow to minimize the number of document transitions.

In calculating performance characteristics, RadiSys assumes that the average number of transitions in a voicemail-type application to be 2 to 3.

5 Reduce application root document size.

The application root document size can grow large if several variables and several catch handlers are defined. Since root documents may be called with every document fetch, having a large root document can cause high CPU consumption, impacting performance. Remove any unused or unnecessary variables and catch handlers from the application root document, and define them within the VoiceXML leaf document where they are required.

This guideline interacts with the previous guideline. Since root documents are called with every document fetch, a large number of VoiceXML documents calling a large root document can exacerbate CPU consumption.

6 Reduce the number of subdialogs.


[1] 3GPP TS 26.244. 3GPP File Format (2GP) Specification. V7.1.0.

[2] Audio-Video Transport Working Group, Casner, S., and P. Hoschka. MIME Type Registration of RTP Payload Formats. Internet Draft, Internet Engineering Task Force, November 2001.

[3] Bos, B., et al. (eds). Cascading Style Sheets, Level 2 (CSS2) Specification. W3C Candidate Recommendation, World Wide Web Consortium, May 1998.

[4] Bray, T., et al. (eds). Extensible Markup Language (XML) 1.0 (Third Edition). W3C Recommendation 04, World Wide Web Consortium, February 2004.

[5] Burnett, D., et al. (eds). Speech Synthesis Markup Language Specification. W3C Working Draft, World Wide Web Consortium, April 2002.

[6] Burnett, D., et al. (eds). SSML 1.0 say-as attribute values. W3C Working Note 26, World Wide Web Consortium, May 2005.

[7] Cable Television Laboratories. PacketCable™ Audio Server Protocol Specification, PKT-SP-ASP-I02-010620. June 2001.

[8] Dahl, D. (ed). Natural Language Semantics Markup Language for the Speech Interface Framework. W3C Working Recommendation, World Wide Web Consortium, November 2000.

[9] Freed, N., and Borenstein, N. Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. RFC 2046, Internet Engineering Task Force, November 1998.

[10] Gellens, R., Singer, D., and P. Frodjh. The Codecs Parameter for "Bucket" Media Types. RFC 1738, Internet Engineering Task Force, November 2005.

[11] Hunt, A., and S. McGlashan. Speech Recognition Grammar Specification Version 1.0. W3C Candidate Recommendation, World Wide Web Consortium, June 2002.

[12] International Organization for Standardization. Codes for the representation of names and languages -- Part 2:Alpha-3 code. ISO 639-2:1998, October 1998.

[13] McGlashan, S. et al. (eds.). Voice Extensible Markup Language: VoiceXML, Version 2.0. W3C Candidate Recommendation, World Wide Web Consortium, March 2004.

[14] Oshry, Matt et al. (eds.) Voice Extensible Markup Language: VoiceXML, Version 2.1. W3C Recommendation 19, World Wide Web Consortium, June 2007.

[15] Rosenberg, J., Schulzrinne, H., Camarillo, G., Johnston, A., Peterson, J., Sparks, R., Handley, M., and E. Schooler. SIP: Session Initiation Protocol. RFC 3261, Internet Engineering Task Force, June 2002.

[16] Schulzrinne, H., and S. Petrack. RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals.

REFERENCES


References

RFC 2833, Internet Engineering Task Force, May 2000.

[17] Shanamugham, S. and D. Burnett. Media Resource Control Protocol Version 2 (MRCPv2). Internet Draft, Internet Engineering Task Force, November 2008.

[18] Shanamugham, S., Monaco, P., and B. Eberman. A Media Resource Control Protocol (MRCP). RFC 4463, Internet Engineering Task Force, April 2006.

[19] Sjoberg, J., Westerlund, M., and Q. Xie. Real-Time Transfer Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs. Internet Engineering Task Force, January 2005. Work in progress.



References


GLOSSARY OF ACRONYMS

3G Third-Generation Wireless

3GP A file format standardized by the 3GPP.

3GPP Third-Generation Partnership Project

3PCC Third Party Call Control

° C Degrees Centigrade

° F Degrees Fahrenheit

AEC Acoustic Echo Cancellation

ARP Address Resolution Protocol

ASR Automatic Speech Recognition

BITS Building Integrated Timing Source

BTU British Thermal Unit. A measure of heat energy. The amount of heat required to raise 1 pound of water by one degree Fahrenheit.

CA Control Agent

CALEA Communications Assistance for Law Enforcement Act

CAN/CSA Canadian Standards Association

CD Compact Disk

CE Conformitée Européenne

CED Fax called station identification tone

CISPR International Special Committee for Radio Interference

CMS Convedia Media Server

CNG Fax calling tone

CO Central Office



CPA Call progress analysis

CPAMD Call progress answering machine detection

CPTD Call progress call detection

CPVAD Call progress voice activity detection

DC Direct Current

DNS Domain Name System

DSP Digital Signal Processor

DTMF Dual Tone Multi Frequency

EMI Electromagnetic Interference

FCC Federal Communications Commission

FRU Field-Replaceable Units

FQDN fully qualified domain name

FTP File Transfer Protocol

GUI Graphical User Interface

HTTP HyperText Transport Protocol

ID Identifier

IMMS Integrated Mobile Media Server

IMS IP Media Subsystem

I/O Input/Output

IP Internet Protocol

IPBCP IP Bearer Control Protocol

IuFP Iu Framing Protocol

IuUP Iu Interface User Plane

IPCC IP Call Center

ITU International Telecommunications Union

IVR Interactive Voice Response

kbps Kilobits per second

kg Kilogram(s)


LAN Local Area Network

lb Pound(s) (weight)

LEA Law Enforcement Agency

LED Light Emitting Diode

Mbps Megabits per second

MIB Management Information Base

MGCP Media Gateway Control Protocol

MOML Media Objects Markup Language

MPC Media Processor Card.

MPI Minimum Picture Interval. The minimum time that can occur between pictures selected for encoding.

MRFP Multimedia Resource Function Processor

MRCP Media Resource Control Protocol

MS Media Server [except when used in conjunction with a Microsoft product, where it represents “Microsoft”]

MSC Mobile Switch Controller

MSML Media Sessions Markup Language

MTBF Mean Time Between Failures

MTTR Mean Time to Restore

NbUP Nb Interface User Plane

NEBS Network Equipment-Building System

NFS Network File System

NR Noise Reduction

OAMP Operations, Administration, Maintenance, and Provisioning

OO Object-Oriented

PDF Portable Document Format

PLMN Public Mobile Land Network

POTS Plain Old Telephone System



PSTN Public Switched Telephone Network

QoS Quality of Service

RF Radio Frequency

RFC Request for Comments

RFI Radio Frequency Interference

RJ-45 Registered Jack 45

RPC Remote Procedure Call

RS-232 Recommended Standard 232

RTCP Real Time Control Protocol

RTP Real Time Protocol

RU Rack Unit. 1.75 in (4.4 cm) in height.

SCC Shelf Control Card.

SDP Session Description Protocol

SIP Session Initiation Protocol

SIT Special Information Tone

SNMP Simple Network Management Protocol

SRGS Speech Recognition Grammar Specification

SSRC Synchronization source

TAC Technical Assistance Center

TCP Transmission Control Protocol

TCP/IP Transmission Control Protocol/Internet Protocol

TFTP Trivial File Transfer Protocol

ToS Type of Service

TTS Text to Speech

UAC User Agent Client

UDP User Datagram Protocol

UL Underwriters Laboratory

URL Uniform Resource Locator


UTC Universal Time Coordinated [formerly GMT]

VAC Volts, Alternating Current

VAD Voice Activity Detector

VDC Volts, Direct Current

VoiceXML Voice eXtensible Markup Language: An XML language designed for defining voice segments and enabling access to the Internet via telephones and other voice-activated devices

VoIP Voice over Internet Protocol

VRU Voice Response Unit

W Watts

XML eXtensible Markup Language



Date post:	28-Nov-2014
Category:	Documents
Upload:	sudhanshu-gupta
View:	171 times
Download:	3 times